Inserting Data Using Amazon S3
SQream uses a native S3 connector for inserting data. The
s3:// URI specifies an external file path on an S3 bucket. File names may contain wildcard characters, and the files can be in CSV or columnar format, such as Parquet and ORC.
The Amazon S3 describes the following topics:
Any database host with access to S3 endpoints can access S3 without any configuration. To read files from an S3 bucket, the database must have listable files.
With S3, specify a location for a file (or files) when using COPY FROM or external_tables.
The following is an example of the general S3 syntax:
AWS ID and
AWS SECRET authentication. These should be specified when executing a statement.
Use a foreign table to stage data from S3 before loading from CSV, Parquet, or ORC files.
The Examples section includes the following examples:
The examples in this section are based on a CSV file, as shown in the following table:
The file is stored on Amazon S3, and this bucket is public and listable. To create a matching
CREATE FOREIGN TABLE statement you can make note of the file structure.
Based on the source file’s structure, you can create a foreign table with the appropriate structure, and point it to your file as shown in the following example:
CREATE FOREIGN TABLE nba ( Name varchar(40), Team varchar(40), Number tinyint, Position varchar(2), Age tinyint, Height varchar(4), Weight real, College varchar(40), Salary float ) WRAPPER csv_fdw OPTIONS ( LOCATION = 's3://sqream-demo-data/nba_players.csv', RECORD_DELIMITER = '\r\n' -- DOS delimited file ) ;
In the example above the file format is CSV, and it is stored as an S3 object. If the path is on HDFS, you must change the URI accordingly. Note that the record delimiter is a DOS newline (
For more information, see the following:
The following shows the data in the foreign table:
t=> SELECT * FROM nba LIMIT 10; name | team | number | position | age | height | weight | college | salary --------------+----------------+--------+----------+-----+--------+--------+-------------------+--------- Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas | 7730337 Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette | 6796117 John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston University | R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia State | 1148640 Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | 231 | | 5000000 Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | 240 | | 12000000 Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | 235 | LSU | 1170960 Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | 238 | Gonzaga | 2165160 Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | 190 | Louisville | 1824360 Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | 220 | Oklahoma State | 3431040
COPY FROM command can also be used to load data without staging it first.
The bucket must be publicly available and objects can be listed.
The following is an example of bulk loading a file from a public S3 bucket:
COPY nba FROM 's3://sqream-demo-data/nba.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n';
For more information on the
COPY FROM command, see COPY FROM.
The following is an example of loading fles from an authenticated S3 bucket:
COPY nba FROM 's3://secret-bucket/*.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n' AWS_ID '12345678' AWS_SECRET 'super_secretive_secret';