SQLoader As a Process
The SQLoader is a CLI program that enables you to load data into SQreamDB from other DBMS and DBaaS.
SQLoader supports Oracle, Postgresql, Teradata, Microsoft SQL Server, and SAP HANA.
Before You Begin
It is essential that you have the following:
Java 17
SQLoader configuration files
SQLoader.jar file
Minimum Hardware Requirements
Component |
Type |
---|---|
CPU cores |
16 |
RAM |
32GB |
Sizing Guidelines
The SQLoader sizing is determined by the number of concurrent tables and threads based on the available CPU cores, limiting it to the number of cores minus one, with the remaining core reserved for the operating system. Each SQLoader instance runs on a single table, meaning concurrent imports of multiple tables require multiple instances. Additionally, when dealing with partitioned tables, each partition consumes a thread, so users should consider the table’s partition count when managing thread allocation for efficient performance.
Installation and Connection
Getting the SQLoader Configuration and JAR Files
Download the
.tar
file using the following command:curl -O https://sq-ftp-public.s3.amazonaws.com/sqloader-v7.13.tar
Extract the
.tar
file using the following command:tar -xf sqloader-7.13.tar.gz
A folder named
sqloader
with the following files is created:
File |
Description |
---|---|
|
Maps foreign DBMS and DBaaS data types into SQreamDB data types during ingestion |
|
Used for defining a connection string and may also be used to reconfigure data loading |
|
A list of reserved words which cannot be used as table and/or column names. |
|
The SQLoader package file |
Establishing a Connection
The sqload-jdbc.properties
file contains a connection string that must be configured to enable data loading into SQreamDB.
Open the
sqload-jdbc.properties
file.Configure connection parameters for:
Either Postgresql, Oracle, Teradata, Microsoft SQL Server, SAP HANA or SQreamDB connection strings
Optionally, Oracle or SQreamDB catalogs (recommended)
Parameter |
Description |
---|---|
|
The host and IP address number |
|
The name of the database from which data is loaded |
|
Username of a role to use for connection |
|
Specifies the password of the selected role |
|
Specifies SSL for this connection |
1# Postgresql, Oracle, Teradata, SAP HANA, Microsoft SQL Server, Sybase and SQreamDB Connection Strings
2# (only one source connection string should be specified)
3
4# postgres (and also Greenplum)
5connectionStringSource=jdbc:postgresql://<HostIp:port>/<database_name>?user=<user_name>&password=<password>&ssl=<true/false>
6
7# oracle
8connectionStringSource=jdbc:oracle:thin:@//<HostIp:port>/<database_name>?user=<user_name>&password=<password>&ssl=<true/false>
9
10# Oracle Autonomous Database
11
12connectionStringSource=jdbc:oracle:thin:@<database_name>?tns_admin=<path_to_oracle_wallet>&user=<user>&password=<password>
13
14# teradata
15connectionStringSource=jdbc:teradata://<HostIp>/DATABASE=<database_name>,DBS_PORT=<port>,user=<user_name>,password=<password>
16
17# sap hana
18connectionStringSource=jdbc:sap://<HostIp>:<port>/?user=<user_name>&password=<password>
19
20# microsoft sql server
21connectionStringSource=jdbc:sqlserver://<HostIp>:<port>;databaseName=<database_name>;user=<user_name>;password=<password>;encrypt=<true/false>;trustServerCertificate=<true/false>
22
23# sybase
24connectionStringSource=jdbc:sybase:Tds:<HostIp>:<port>/<database_name>?user=<user_name>&password=<password>
25
26# sqream
27connectionStringSqream=jdbc:Sqream://<HostIp:port>/<database_name>;cluster=<true/false>;user=<user_name>;password=<password>
28
29
30
31# Catalog Database Parameters
32
33# Connection string (only one catalog connection string should be specified)
34# Catalog database connection string on Oracle:
35connectionStringCatalog=jdbc:oracle:thin:@//<HostIp:port>/<database_name>?user=<user_name>&password=<password>
36
37# Catalog database connection string on SQreamDB:
38connectionStringCatalog=jdbc:Sqream://<HostIp:port>/<database_name>;cluster=<true/false>;user=<user_name>;password=<password>
39
40
41
42# CDC and Incremental Parameters
43cdcCatalogTable=public.CDC_TABLES
44cdcTrackingTable=public.CDC_TRACKING
45cdcPrimaryKeyTable=public.CDC_TABLE_PRIMARY_KEYS
46
47# Summary table
48loadSummaryTable=public.SQLOAD_SUMMARY
49
50
51
52# OPTIONAL - Data transfer options
53filter=1=1
54count=true
55limit=2000
56threadCount=1
57rowid=false
58batchSize=500
59fetchSize=100000
60chunkSize=0
61caseSensitive=false
62truncate=true
63drop=true
64loadTypeName=full
65cdcDelete=true
66usePartitions=false
67lockCheck=false
68lockTable=true
69loadDttm=false
70useDbmsLob=false
71
72.. more flags
Starting SQLoader
To start SQLoader, run the sqloader.jar
file:
java -jar sqloader.jar
Loading Data into SQreamDB
To load data into SQreamDB using SQLoader, you must specify a source table name at minimum. Executing the command below will generate a SQreamDB table using the specified source table’s DDL and load all its data:
java -jar sqloader.jar -table <source_table_name>
Customizing Data Load
While specifying a source table name is mandatory for data loading, you have the option to customize the loading process by utilizing optional parameters. These parameters can be configured either within a CLI command or by adjusting settings in the properties file.
Please note that any customization done through the CLI will override configurations made using the properties file.
CLI Parameter |
State |
Default |
Type |
Description |
---|---|---|---|---|
|
Optional |
|
The number of records to be inserted into SQreamDB at once. Please note that the configured batch size may impact chunk sizes |
|
|
Optional |
|
If |
|
|
Optional |
|
Check CDC chain between tracking table and source table |
|
|
Optional |
|
The number of records read at once from the source database |
|
|
Optional |
None |
|
The name of the file that contains all column names. Columns must be separated using |
|
Optional |
All columns |
The name or names of columns to be loaded into SQreamDB (“col1,col2, …”). For column names containing uppercase characters, maintain the uppercase format, avoid using double quotes or apostrophes, and ensure that the |
|
|
Optional |
|
Defines the path to the configuration file you wish to use. This parameter may be defined using only the CLI |
|
|
Optional |
|
Defines the path to the folder containing both the data type mapping and the reserved words files. The defined folder must contain both files or else you will receive an error |
|
|
Optional |
|
Defines whether or not table rows will be counted before being loaded into SQreamDB |
|
|
Optional |
|
Defines whether or not loading using Change Data Capture (CDC) includes deleted rows |
|
|
Optional |
|
Assuming we’re loading into table
|
|
|
Optional |
|
The number of records to be read at once from source database |
|
|
Optional |
|
Defines whether or not only records with SQL conditions are loaded |
|
|
Optional |
No input |
Displays the help menu and exits |
|
|
Optional |
|
Limits the number of rows to be loaded |
|
|
Optional |
|
Add an additional |
|
|
Optional |
|
Defines whether or not SQLoader will check source table is locked before the loading starts |
|
|
Optional |
|
Defines whether or not SQLoader will lock target table before the loading starts |
|
|
Optional |
|
Defines the path of log directory created when loading data. If no value is specified, a |
|
|
Optional |
None |
Partition identifier |
Specifies the number of table partitions. If configured, |
|
Optional |
|
Defines whether or not SQLoader will get row IDs from Oracle tables |
|
|
Optional |
|
Defines the source database name. It does not modify the database connection string but impacts the storage and retrieval of data within catalog tables |
|
|
Optional |
None |
Column name |
Column name for split (required for multi-thread loads) |
|
Mandatory |
None |
Table name |
Source table name to load data from |
|
Optional |
Target table name |
Table name |
Target table name to load data into |
|
Optional |
|
Number of threads to use for loading. Using multiple threads can significantly improve the loading performance, especially when dealing with columns that have metadata statistics (e.g., min/max values). SQLoader will automatically divide the data into batches based on the specified thread number, allowing for parallel processing. You may use |
|
|
Optional |
|
Truncate target table before loading |
|
|
Optional |
|
Defines a loading type that affects the table that is created in SQreamDB. Options are:
|
|
|
Optional |
|
Defines whether or not SQLoader uses |
|
|
Optional |
|
Defines whether or not SQLoader uses partitions in |
Data Types and Mapping
SQLoader automatically assigns data types during the data loading process. Nevertheless, you retain the choice to manually specify the preferred data type you want to map to during the loading operation.
Automatic Mapping
Oracle
Oracle Type |
SQreamDB Type |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Postgresql
Postgresql Type |
SQreamDB Type |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Teradata
Teradata Type |
SQreamDB Type |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Microsoft SQL Server
Microsoft SQL Server Type |
SQreamDB Type |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SAP HANA
SAP HANA Type |
SQreamDB Type |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Manual Mapping
You have the possibility to adjust the SQLoader mapping process according to your specific needs using the names
Method.
To specify that you want to map one or more columns in your table to a specific data type, duplicate the code block which maps to the SQreamDB data type you want and include the names
parameter in your code block. The SQLoader will map the specified columns to the specified SQreamDB data type. After the specified columns are mapped, the SQLoader continue to search for how to convert other data types to the same data type of the specified columns.
In this example, column1
, column2
, and column3
are mapped to BIGINT
and the Oracle data types BIGINT
, INT
, SMALLINT
, INTEGER
are also mapped to BIGINT
.
{
"oracle": [
{
"names": ["column1", "column2", "column3"],
"sqream": "bigint",
"java": "int",
"length": false
},
{
"type": ["bigint","int","smallint","integer"],
"sqream": "bigint",
"java": "int",
"length": false
}
}
Creating Summary Tables
Summary tables are pre-aggregated tables that store summarized or aggregated data, which can help improve query performance and reduce the need for complex calculations during runtime.
Summary tables are part of the schema within the database catalog.
Examples
The following examples use Oracle syntax.
A Summary Table
CREATE TABLE public.SQLOAD_SUMMARY (
DB_NAME TEXT(200 BYTE) VISIBLE,
SCHEMA_NAME TEXT(200 BYTE) VISIBLE,
TABLE_NAME TEXT(200 BYTE) VISIBLE,
TABLE_NAME_FULL TEXT(200 BYTE) VISIBLE,
LOAD_TYPE TEXT(200 BYTE) VISIBLE,
UPDATED_DTTM_FROM DATE VISIBLE,
UPDATED_DTTM_TO DATE VISIBLE,
LAST_VAL_INT NUMBER(22,0) VISIBLE,
LAST_VAL_TS DATE VISIBLE,
START_TIME TIMESTAMP(6) VISIBLE,
FINISH_TIME TIMESTAMP(6) VISIBLE,
ELAPSED_SEC NUMBER VISIBLE,
ROW_COUNT NUMBER VISIBLE,
SQL_FILTER TEXT(200 BYTE) VISIBLE,
PARTITION TEXT(200 BYTE) VISIBLE,
STMT_TYPE TEXT(200 BYTE) VISIBLE,
STATUS TEXT(200 BYTE) VISIBLE,
LOG_FILE TEXT(200 BYTE) VISIBLE,
DB_URL TEXT(200 BYTE) VISIBLE,
PARTITION_COUNT NUMBER VISIBLE DEFAULT 0,
THREAD_COUNT NUMBER VISIBLE DEFAULT 1,
ELAPSED_MS NUMBER VISIBLE DEFAULT 0,
STATUS_CODE NUMBER VISIBLE DEFAULT 0,
ELAPSED_SOURCE_MS NUMBER(38,0) DEFAULT NULL,
ELAPSED_SOURCE_SEC NUMBER(38,0) DEFAULT NULL,
ELAPSED_TARGET_MS NUMBER(38,0) DEFAULT NULL,
ELAPSED_TARGET_SEC NUMBER(38,0) DEFAULT NULL,
TARGET_DB_URL VARCHAR2(200) DEFAULT NULL,
SQLOADER_VERSION VARCHAR2(20) DEFAULT NULL,
HOST VARCHAR2(200) DEFAULT NULL
);
Change Data Capture Summary Tables
Change Data Capture (CDC) tables are supported only for Oracle.
CREATE TABLE public.CDC_TABLES (
DB_NAME TEXT(200 BYTE) VISIBLE,
SCHEMA_NAME TEXT(200 BYTE) VISIBLE,
TABLE_NAME TEXT(200 BYTE) VISIBLE,
TABLE_NAME_FULL TEXT(200 BYTE) VISIBLE,
TABLE_NAME_CDC TEXT(200 BYTE) VISIBLE,
INC_COLUMN_NAME TEXT(200 BYTE) VISIBLE,
INC_COLUMN_TYPE TEXT(200 BYTE) VISIBLE,
LOAD_TYPE TEXT(200 BYTE) VISIBLE,
FREQ_TYPE TEXT(200 BYTE) VISIBLE,
FREQ_INTERVAL NUMBER(22,0) VISIBLE,
IS_ACTIVE NUMBER VISIBLE DEFAULT 0,
STATUS_LOAD NUMBER VISIBLE DEFAULT 0,
INC_GAP_VALUE NUMBER VISIBLE DEFAULT 0
);
CREATE TABLE public.CDC_TRACKING (
DB_NAME TEXT(200 BYTE) VISIBLE,
SCHEMA_NAME TEXT(200 BYTE) VISIBLE,
TABLE_NAME TEXT(200 BYTE) VISIBLE,
TABLE_NAME_FULL TEXT(200 BYTE) VISIBLE,
LAST_UPDATED_DTTM DATE VISIBLE,
LAST_VAL_INT NUMBER(22,0) VISIBLE DEFAULT 0,
LAST_VAL_TS TIMESTAMP(6) VISIBLE,
LAST_VAL_DT DATE VISIBLE
);
CREATE TABLE public.CDC_TABLE_PRIMARY_KEYS (
DB_NAME TEXT(200 BYTE) VISIBLE,
SCHEMA_NAME TEXT(200 BYTE) VISIBLE,
TABLE_NAME TEXT(200 BYTE) VISIBLE,
TABLE_NAME_FULL TEXT(200 BYTE) VISIBLE,
CONSTRAINT_NAME TEXT(200 BYTE) VISIBLE,
COLUMN_NAME TEXT(200 BYTE) VISIBLE,
IS_NULLABLE NUMBER VISIBLE DEFAULT 0
);
CLI Examples
Loading data into a CDC table using the type
and limit
parameters:
java -jar sqloader.jar -table <source_table_name> -type cdc -limit 100
Loading data into a table using your own configuration file (this will override the default configuration file):
java -jar sqloader.jar -config path/to/your/config/file
Loading data into a table using a custom configuration file:
java -jar -config MyConfigFile.properties -table <source_table_name> -type cdc -target <target_table_name> -drop true -lock_check false
Loading data into a table using a the filter
parameter:
java -jar sqloader.jar -table <source_table_name> -filter column_name>50