# Inserting data¶

This guide covers inserting data into SQream DB, with subguides on inserting data from a variety of sources and locations.

SQream DB supports importing data from the following sources:

• Text - CSV, TSV, PSV
• Parquet
• ORC

### Verify data and performance after load¶

Like other RDBMSs, SQream DB has its own set of best practcies for table design and query optimization.

SQream therefore recommends:

• Verify that the data is as you expect it (e.g. row counts, data types, formatting, content)
• Best practices were followed for table design
• Applications such as Tableau and others have been tested, and work
• Data types were not over-provisioned (e.g. don’t use VARCHAR(2000) to store a short string)

During loading using COPY FROM, the statement can run on any worker. If you are running multiple nodes, make sure that all nodes can see the source the same. If you load from a local file which is only on 1 node and not on shared storage, it will fail some of the time. (If you need to, you can also control which node a statement runs on using the Workload Manager).

SQream DB’s COPY FROM syntax can be used to load CSV files, but can’t be used for Parquet and ORC.

FOREIGN TABLE can be used to load text files, Parquet, and ORC files, and can also transform the data prior to materialization as a full table.

Method / File type Text (CSV) Parquet ORC Streaming data
COPY FROM
Foreign Tables
INSERT ✓ (Python, JDBC, Node.JS)

### Unsupported data types¶

SQream DB doesn’t support the entire set of features that some other database systems may have, such as ARRAY, BLOB, ENUM, SET, etc.

These data types will have to be converted before load. For example, ENUM can often be stored as a VARCHAR.

### Extended error handling¶

While external tables can be used to load CSVs, the COPY FROM statement provides more fine-grained error handling options, as well as extended support for non-standard CSVs with multi-character delimiters, alternate timestamp formats, and more.

### Best practices for CSV¶

Text files like CSV rarely conform to RFC 4180 , so alterations may be required:

• Use OFFSET 2 for files containing header rows

• Failed rows can be captured in a log file for later analysis, or just to skip them. See Unsupported Field Delimiters for information on skipping rejected rows.

• Record delimiters (new lines) can be modified with the RECORD DELIMITER syntax.

• If the date formats differ from ISO 8601, refer to the Supported Date Formats section to see how to override default parsing.

• Fields in a CSV can be optionally quoted with double-quotes ("). However, any field containing a newline or another double-quote character must be quoted.

If a field is quoted, any double quote that appears must be double-quoted (similar to the string literals quoting rules. For example, to encode What are "birds"?, the field should appear as "What are ""birds""?".

• Field delimiters don’t have a to be a displayable ASCII character. See Supported Field Delimiters for all options.

### Best practices for Parquet¶

• Parquet files are loaded through Foreign Tables. The destination table structure has to match in number of columns between the source files.
• Parquet files support predicate pushdown. When a query is issued over Parquet files, SQream DB uses row-group metadata to determine which row-groups in a file need to be read for a particular query and the row indexes can narrow the search to a particular set of rows.

#### Type support and behavior notes¶

• Unlike ORC, the column types should match the data types exactly (see table below).

SQream DB type →

Parquet source

BOOL TINYINT SMALLINT INT BIGINT REAL DOUBLE Text [1] DATE DATETIME
BOOLEAN
INT16
INT32
INT64
FLOAT
DOUBLE
BYTE_ARRAY [2]
INT96 [3]                   [4]
• If a Parquet file has an unsupported type like enum, uuid, time, json, bson, lists, maps, but the data is not referenced in the table (it does not appear in the SELECT query), the statement will succeed. If the column is referenced, an error will be thrown to the user, explaining that the type is not supported, but the column may be ommited.

### Best practices for ORC¶

• ORC files are loaded through Foreign Tables. The destination table structure has to match in number of columns between the source files.
• ORC files support predicate pushdown. When a query is issued over ORC files, SQream DB uses ORC metadata to determine which stripes in a file need to be read for a particular query and the row indexes can narrow the search to a particular set of 10,000 rows.

#### Type support and behavior notes¶

• ORC files are loaded through Foreign Tables. The destination table structure has to match in number of columns between the source files.
• The types should match to some extent within the same “class” (see table below).

SQream DB type →

ORC source

BOOL TINYINT SMALLINT INT BIGINT REAL DOUBLE Text [1] DATE DATETIME
boolean [5] [5] [5] [5]
tinyint [6]
smallint [6] [7]
int [6] [7] [7]
bigint [6] [7] [7] [7]
float
double
string / char / varchar
date
timestamp, timestamp with timezone
• If an ORC file has an unsupported type like binary, list, map, and union, but the data is not referenced in the table (it does not appear in the SELECT query), the statement will succeed. If the column is referenced, an error will be thrown to the user, explaining that the type is not supported, but the column may be ommited.

## Further reading and migration guides¶

Migration guides

 [1] (1, 2) Text values include TEXT, VARCHAR, and NVARCHAR
 [3] With TIMESTAMP_NANOS or TIMESTAMP_MILLIS annotation