CSV

This guide covers ingesting data from CSV files into BLUE using the COPY FROM method.

Foreign Data Wrapper Prerequisites

Before proceeding, ensure the following Foreign Data Wrapper (FDW) prerequisites:

  • File Existence: Verify that the file you are ingesting data from exists at the specified path.

  • Path Accuracy: Confirm that all path elements are present and correctly spelled. Any inaccuracies may lead to data retrieval issues.

  • Bucket Access Permissions: Ensure that you have the necessary access permissions to the bucket from which you are ingesting data. Lack of permissions can hinder the data retrieval process.

  • Wildcard Accuracy: If using wildcards, double-check their spelling and configuration. Misconfigured wildcards may result in unintended data ingestion.

Figure Out The Table Structure

Prior to loading data, you will need to write out the table structure, so that it matches the file structure.

For example, to import the data from nba.csv, we will first look at the file:

nba.csv

Name

Team

Number

Position

Age

Height

Weight

College

Salary

Avery Bradley

Boston Celtics

0

PG

25

44714

180

Texas

7730337

Jae Crowder

Boston Celtics

99

SF

25

44718

235

Marquette

6796117

John Holland

Boston Celtics

30

SG

27

44717

205

Boston University

R.J. Hunter

Boston Celtics

28

SG

22

44717

185

Georgia State

1148640

Jonas Jerebko

Boston Celtics

8

PF

29

44722

231

5000000

Amir Johnson

Boston Celtics

90

PF

29

44721

240

12000000

Jordan Mickey

Boston Celtics

55

PF

21

44720

235

LSU

1170960

Kelly Olynyk

Boston Celtics

41

C

25

36708

238

Gonzaga

2165160

Terry Rozier

Boston Celtics

12

PG

22

44714

190

Louisville

1824360

  • The file format in this case is CSV, and it is stored as an S3 object.

  • The first row of the file is a header containing column names.

  • The record delimiter was a DOS newline (\r\n).

  • The file is stored on S3, at s3://sqream-demo-data/nba.csv.

Bulk Load The Data With COPY FROM

The CSV is a standard CSV, but with two differences from BLUE defaults:

  • The record delimiter is not a Unix newline (\n), but a Windows newline (\r\n)

  • The first row of the file is a header containing column names, which we’ll want to skip.

COPY
  nba
FROM
WRAPPER
  csv_fdw
OPTIONS
  (
    LOCATION = 's3://sqream-docs/nba.csv',
    RECORD_DELIMITER = '\r\n',
    OFFSET = 2;
);

Repeat steps 3 and 4 for every CSV file you want to import.

Loading A PSV (Pipe Separated Value) File

COPY
  nba
FROM
WRAPPER
  csv_fdw
OPTIONS
  (
    LOCATION = 's3://sqream-docs/nba.csv',
    DELIMITER = '|'
);

Loading A TSV (Tab Separated Value) File

COPY
  nba
FROM
WRAPPER
  csv_fdw
OPTIONS
  (
    LOCATION = 's3://sqream-docs/nba.csv',
    DELIMITER = '\t';
);

Loading A Text File With Non-Printable Delimiter

In the file below, the separator is DC1, which is represented by ASCII 17 decimal or 021 octal.

COPY
  nba
FROM
WRAPPER
  csv_fdw
OPTIONS
  (
    LOCATION = 's3://sqream-docs/nba.csv',
    DELIMITER = E'\021'
);

Loading A Text File With Multi-Character Delimiters

In the file below, the separator is '|.

COPY
  nba
FROM
WRAPPER
  csv_fdw
OPTIONS
  (
    LOCATION = 's3://sqream-docs/nba.csv',
    DELIMITER = '''|'
);

Loading Files With A Header Row

Use OFFSET to skip rows.

Note

When loading multiple files (e.g. with wildcards), this setting affects each file separately.

COPY
  nba
FROM
WRAPPER
  csv_fdw
OPTIONS
  (
    LOCATION = 's3://sqream-docs/nba.csv',
    DELIMITER = '|',
    OFFSET  2
);

Loading Files Formatted For Windows (\r\n)

COPY
  nba
FROM
WRAPPER
  csv_fdw
OPTIONS
  (
    LOCATION = 's3://sqream-docs/nba.csv',
    DELIMITER = '|',
    RECORD_DELIMITER = '\r\n'
);

Loading Non-Standard Dates

If files contain dates not formatted as ISO8601, tell COPY how to parse the column. After parsing, the date will appear as ISO8601 inside BLUE.

In this example, date_col1 and date_col2 in the table are non-standard. date_col3 is mentioned explicitly, but can be left out. Any column that is not specified is assumed to be ISO8601.

COPY
  nba
FROM
WRAPPER
  csv_fdw
OPTIONS
  (
    LOCATION = 's3://sqream-docs/nba.csv',
    DATETIME_FORMAT = 'date_col1=YMD,date_col2=MDY,date_col3=default'
);

Tip

The full list of supported date formats can be found under the Supported date formats section of the COPY FROM reference.