CREATE FOREIGN TABLE
The CREATE FOREIGN TABLE
command creates a foreign table that references data stored externally to SQreamDB. This allows for querying data located in files on a file system, external storage platforms, or other databases.
When querying data stored in file formats that support metadata, such as Parquet, ORC, JSON, and Avro, it is possible to omit the DDL when creating a foreign table. SQreamDB can read the file metadata, enabling the automatic inference of column structure and data types.
Syntax
CREATE [ OR REPLACE ] FOREIGN TABLE [ "<schema_name>" ]."<table_name>" (
[ column_def [, ...] ] -- When creating foreign tables using CSV source files, it is mandatory to provide the complete table DDL
)
[ FOREIGN DATA ] WRAPPER fdw_name
[ OPTIONS ( option_def [, ... ] ) ]
fdw_name ::=
{ csv_fdw | orc_fdw | parquet_fdw }
option_def ::=
LOCATION = '{ path_spec }'
[
| DELIMITER = '{ field_delimiter }' -- for CSV only
| RECORD_DELIMITER = '{ record_delimiter }' -- for CSV only
| AWS_ID '{ AWS ID }'
| AWS_SECRET '{ AWS SECRET }'
| QUOTE = {'C' | E'\ooo') -- for CSV only
]
path_spec ::= { GS URI | S3 URI | HDFS URI }
field_delimiter ::= delimiter_character
record_delimiter ::= delimiter_character
column_def ::=
{ column_name type_name [ default ] [ column_constraint ] }
column_name ::= identifier
column_constraint ::=
{ NOT NULL | NULL }
default ::=
DEFAULT default_value
| IDENTITY [ ( start_with [ , increment_by ] ) ]
Parameters
Parameter |
Description |
---|---|
|
Create a new table, and overwrite any existing table by the same name. Does not return an error if the table already exists. |
|
The name of the schema in which to create the table |
|
The name of the table to create, which must be unique inside the schema |
|
A comma separated list of column definitions. A minimal column definition includes a name and datatype. Other column constraints and default values may optionally be added. When creating foreign tables using CSV source files, it is mandatory to provide the complete table DDL |
|
Specifies the format of the source files, such as |
|
Specifies a path or URI of the source files, such as |
|
Specifies the field delimiter for CSV files. Defaults to |
|
Specifies the record delimiter for CSV files. Defaults to a newline, |
|
Credentials for authenticated S3 access |
|
Used to specify the number of rows to skip from the beginning of the result set |
|
Specifies if errors should be ignored or skipped. When set to |
|
Specifies the threshold for the maximum number of faulty records that will be ignored. This setting must be used in conjunction with |
|
Specifies an alternative quote character. The quote character must be a single, 1-byte printable ASCII character, and the equivalent octal syntax of the copy command can be used. The quote character cannot be contained in the field delimiter, the record delimiter, or the null marker. QUOTE can be used with |
Usage Notes
When creating foreign tables from CSV files, it is required to provide a table DDL.
When creating a foreign table using the
*
wildcard, SQreamDB assumes that all files in the path use the same schema.
Examples
Creating a Tab-Delimited Table
CREATE
OR REPLACE FOREIGN TABLE nba_new(
"player_name" text null,
"team_name" text null,
"jersey_number" int null,
"position" text null,
"age" int null,
"height" text null,
"weight" int null,
"college" text null,
"salary" int null
)
WRAPPER
csv_fdw
OPTIONS
(LOCATION = 'gs://blue_docs/nba.csv',
DELIMITER = '\t'
);
Creating a Table Located In a HDFS Directory
CREATE FOREIGN TABLE users (
id INT NOT NULL,
name TEXT(30) NOT NULL,
email TEXT(50) NOT NULL
)
WRAPPER
parquet_fdw
OPTIONS
(
LOCATION = 'hdfs://hadoop-nn.piedpiper.com/rhendricks/users/*.parquet'
);
Creating a Table Located Within a S3 Bucket of ORC Files
CREATE FOREIGN TABLE users (
id INT NOT NULL,
name TEXT(30) NOT NULL,
email TEXT(50) NOT NULL
)
WRAPPER
orc_fdw
OPTIONS
(
LOCATION = 's3://pp-secret-bucket/users/*.orc',
AWS_ID = 'our_aws_id',
AWS_SECRET = 'our_aws_secret'
);
Converting a Foreign Table to an Internal Table
Using a foreign table allows you to perform ETL-like operations by applying SQL functions and operations to raw files.
CREATE TABLE
real_table AS
SELECT
*
FROM
some_foreign_table;
Using the OFFSET
Parameter
The OFFSET
parameter may be used with Parquet and CSV textual formats.
CREATE FOREIGN TABLE users7 (
id INT NOT NULL,
name TEXT NOT NULL,
email TEXT NOT NULL
)
WRAPPER
parquet_fdw
OPTIONS
(
LOCATION = 'hdfs://hadoop-nn.piedpiper.com/rhendricks/users/*.parquet',
OFFSET = 2
);
Using the CONTINUE_ON_ERROR
and ERROR_COUNT
Parameters
CREATE
OR REPLACE FOREIGN TABLE cool_animalz (
id INT NOT NULL,
name TEXT NOT NULL,
weight FLOAT NOT NULL
)
WRAPPER
csv_fdw
OPTIONS
(
LOCATION = '/home/rhendricks/cool_animals.csv',
DELIMITER = '\t',
CONTINUE_ON_ERROR = true,
ERROR_COUNT = 3
);
Customizing Quotations Using Alternative Characters
CREATE
OR REPLACE FOREIGN TABLE cool_animalz (
id INT NOT NULL,
name text(30) NOT NULL,
weight FLOAT NOT NULL
)
WRAPPER
csv_fdw
OPTIONS
(
LOCATION = '/home/rhendricks/cool_animals.csv',
DELIMITER = '\t',
QUOTE = '@'
);
Permissions
The role must have the CREATE
permission at the database level.
The automatic foreign table DDL resolution feature requires Read permissions.