Inserting Data Using Amazon S3

SQream uses a native S3 connector for inserting data. The s3:// URI specifies an external file path on an S3 bucket. File names may contain wildcard characters, and the files can be in CSV or columnar format, such as Parquet and ORC.

S3 Configuration

A best practice for granting access to AWS S3 is by creating an Identity and Access Management (IAM) user account. If creating an IAM user account is not possible, you may follow AWS guidelines for using the global configuration object and setting an AWS region.

S3 URI Format

With S3, specify a location for a file (or files) when using COPY FROM or Foreign Tables.

The following is an example of the general S3 syntax:

s3://bucket_name/path

Authentication

SQream supports AWS ID and AWS SECRET authentication. These should be specified when executing a statement.

Connecting to S3 Using SQreamDB Legacy Configuration File

You may use the following parameters within your SQreamDB legacy configuration file:

Parameter

Description

Parameter Value

Example

AwsEndpointOverride

Overrides the AWS S3 HTTP endpoint when using Virtual Private Cloud (VPC)

URL

Default: None

sqream_config_legacy.json:
{
  ...,
  "AwsEndpointOverride": "https://my.endpoint.local"
}

AwsObjectAccessStyle

Enables configuration of S3 object access styles, which determine how you can access and interact with the objects stored in an S3 bucket

virtual-host or path. Default is virtual-host

sqream_config_legacy.json:
{
  ...,
  "AwsObjectAccessStyle": "path"
}

Examples

Use a foreign table to stage data from S3 before loading from CSV, Parquet, or ORC files.

Planning for Data Staging

The examples in this section are based on the following table:

Name

Team

Number

Position

Age

Height

Weight

College

Salary

Avery Bradley

Boston Celtics

0

PG

25

2-Jun

180

Texas

7730337

Jae Crowder

Boston Celtics

99

SF

25

6-Jun

235

Marquette

6796117

John Holland

Boston Celtics

30

SG

27

5-Jun

205

Boston University

R.J. Hunter

Boston Celtics

28

SG

22

5-Jun

185

Georgia State

1148640

Jonas Jerebko

Boston Celtics

8

PF

29

10-Jun

231

5000000

Amir Johnson

Boston Celtics

90

PF

29

9-Jun

240

12000000

Jordan Mickey

Boston Celtics

55

PF

21

8-Jun

235

LSU

1170960

Kelly Olynyk

Boston Celtics

41

C

25

Jul-00

238

Gonzaga

2165160

Terry Rozier

Boston Celtics

12

PG

22

2-Jun

190

Louisville

1824360

The file is stored on Amazon S3, and this bucket is public and listable. To create a matching CREATE FOREIGN TABLE statement you can make note of the file structure.

Creating a Foreign Table

Based on the source file’s structure, you can create a foreign table with the appropriate structure, and point it to your file as shown in the following example:

CREATE FOREIGN TABLE nba
(
   Name text(40),
   Team text(40),
   Number tinyint,
   Position text(2),
   Age tinyint,
   Height text(4),
   Weight real,
   College text(40),
   Salary float
 )
 WRAPPER csv_fdw
 OPTIONS
   (
      LOCATION = 's3://sqream-demo-data/nba_players.csv',
      RECORD_DELIMITER = '\r\n' -- DOS delimited file
   )
 ;

In the example above the file format is CSV, and it is stored as an S3 object. If the path is on HDFS, you must change the URI accordingly. Note that the record delimiter is a DOS newline (\r\n).

For more information, see the following:

Querying Foreign Tables

The following shows the data in the foreign table:

t=> SELECT * FROM nba LIMIT 10;
name          | team           | number | position | age | height | weight | college           | salary
--------------+----------------+--------+----------+-----+--------+--------+-------------------+---------
Avery Bradley | Boston Celtics |      0 | PG       |  25 | 6-2    |    180 | Texas             |  7730337
Jae Crowder   | Boston Celtics |     99 | SF       |  25 | 6-6    |    235 | Marquette         |  6796117
John Holland  | Boston Celtics |     30 | SG       |  27 | 6-5    |    205 | Boston University |
R.J. Hunter   | Boston Celtics |     28 | SG       |  22 | 6-5    |    185 | Georgia State     |  1148640
Jonas Jerebko | Boston Celtics |      8 | PF       |  29 | 6-10   |    231 |                   |  5000000
Amir Johnson  | Boston Celtics |     90 | PF       |  29 | 6-9    |    240 |                   | 12000000
Jordan Mickey | Boston Celtics |     55 | PF       |  21 | 6-8    |    235 | LSU               |  1170960
Kelly Olynyk  | Boston Celtics |     41 | C        |  25 | 7-0    |    238 | Gonzaga           |  2165160
Terry Rozier  | Boston Celtics |     12 | PG       |  22 | 6-2    |    190 | Louisville        |  1824360
Marcus Smart  | Boston Celtics |     36 | PG       |  22 | 6-4    |    220 | Oklahoma State    |  3431040

Bulk Loading a File from a Public S3 Bucket

The COPY FROM command can also be used to load data without staging it first.

Note

The bucket must be publicly available and objects can be listed.

The following is an example of bulk loading a file from a public S3 bucket:

COPY nba FROM 's3://sqream-demo-data/nba.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n';

For more information on the COPY FROM command, see COPY FROM.

Loading Files from an Authenticated S3 Bucket

The following is an example of loading fles from an authenticated S3 bucket:

COPY nba FROM 's3://secret-bucket/*.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n'
AWS_ID '12345678'
AWS_SECRET 'super_secretive_secret';