SQream DB Documentation
SQream DB is a columnar analytic SQL database management system. SQream DB supports regular SQL including a substantial amount of ANSI SQL, uses serializable transactions, and scales horizontally for concurrent statements. Even a basic SQream DB machine can support tens to hundreds of terabytes of data. SQream DB easily plugs in to third-party tools like Tableau comes with standard SQL client drivers, including JDBC, ODBC, and Python DB-API.
Topic |
Description |
|
---|---|---|
Getting Started |
||
Set up your local machine according to SQream’s recommended pre-installation configurations. |
||
Provides more information about the available methods for executing statements in SQream. |
||
Provides more information on performing basic operations. |
||
Describes SQream’s mandatory and recommended hardware settings, designed for a technical audience. |
||
Installation Guides |
||
Refers to SQream’s installation guides. |
||
Refers to all installation guides required for installations related to Studio. |
||
Ingesting Data |
||
Connecting to SQream |
||
Describes how to install and connect a variety of third party connection platforms and tools. |
||
Describes how to use the SQream client drivers and client applications with SQream. |
||
External Storage Platforms |
||
Describes how to insert data over a native S3 connector. |
||
Describes how to configure an HDFS environment for the user sqream and is only relevant for users with an HDFS environment. |
Need help?
If you couldn’t find what you’re looking for, we’re always happy to help. Visit SQream’s support portal for additional support.
Getting Started
The Getting Started page describes the following things you need to start using SQream:
Preparing Your Machine to Install SQreamDB
To prepare your machine to install SQream, do the following:
Set up your local machine according to SQream’s recommended pre-installation configurations.
Verify you have an NVIDIA-capable server, either on-premise or on supported cloud platforms:
Red Hat Enterprise Linux v7.x
CentOS v7.x
Amazon Linux 7
Verify that you have the following:
An NVIDIA GPU - SQream recommends using a Tesla GPU.
An SSH connection to your server.
SUDO permissions for installation and configuration purposes.
A SQream license - Contact support@sqream.com or your SQream account manager for your license key.
For more information, see the following:
Installing SQreamDB
Method |
Description |
---|---|
Describes installing SQreamDB using binary packages provided by SQreamDB |
Executing Statements in SQream
You can execute statements in SQream using one of the following tools:
SQream SQL CLI - a command line interface
SQream Acceleration Studio - an intuitive and easy-to-use interface.
Performing Basic SQream Operations
After installing SQream you can perform the operations described on this page:
Running the SQream SQL Client
The following example shows how to run the SQream SQL client:
$ sqream sql --port=5000 --username=rhendricks -d master
Password:
Interactive client mode
To quit, use ^D or \q.
master=> _
Running the SQream SQL client prompts you to provide your password. Use the username and password that you have set up, or your DBA has provided.
Tip
You can exit the shell by typing
\q
or Ctrl-d.A new SQream cluster contains a database named master, which is the database used in the examples on this page.
Creating Your First Table
The Creating Your First Table section describes the following:
Creating a Table
The CREATE TABLE
syntax is used to create your first table. This table includes a table name and column specifications, as shown in the following example:
CREATE TABLE cool_animals (
id INT NOT NULL,
name TEXT(20),
weight INT
);
For more information on creating a table, see CREATE TABLE.
Replacing a Table
You can drop an existing table and create a new one by adding the OR REPLACE
parameter after the CREATE
keyword, as shown in the following example:
CREATE OR REPLACE TABLE cool_animals (
id INT NOT NULL,
name TEXT(20),
weight INT
);
Listing a CREATE TABLE Statement
You can list the full, verbose CREATE TABLE
statement for a table by using the GET DDL function with the table name as shown in the following example:
test=> SELECT GET_DDL('cool_animals');
create table "public"."cool_animals" (
"id" int not null,
"name" text(20),
"weight" int
);
Note
SQream DB identifier names such as table names and column names are not case sensitive. SQreamDB lowercases all identifiers by default. If you want to maintain case, enclose the identifiers with double-quotes.
SQream DB places all tables in the public schema, unless another schema is created and specified as part of the table name.
For information on listing a CREATE TABLE
statement, see GET DDL.
Dropping a Table
When you have finished working with your table, you can drop the table to remove it table and its content, as shown in the following example:
test=> DROP TABLE cool_animals;
executed
For more information on dropping tables, see DROP TABLE.
Listing Tables
To see the tables in the current database you can query the catalog, as shown in the following example:
test=> SELECT table_name FROM sqream_catalog.tables;
cool_animals
1 rows
Inserting Rows
The Inserting Rows section describes the following:
Inserting Basic Rows
You can insert basic rows into a table using the INSERT
statement. The inserted statement includes the table name, an optional list of column names, and column values listed in the same order as the column names, as shown in the following example:
test=> INSERT INTO cool_animals VALUES (1, 'Dog', 7);
executed
Changing Value Order
You can change the order of values by specifying the column order, as shown in the following example:
test=> INSERT INTO cool_animals(weight, id, name) VALUES (3, 2, 'Possum');
executed
Inserting Multiple Rows
You can insert multiple rows using the INSERT
statement by using sets of parentheses separated by commas, as shown in the following example:
test=> INSERT INTO cool_animals VALUES
(3, 'Cat', 5) ,
(4, 'Elephant', 6500) ,
(5, 'Rhinoceros', 2100);
executed
Note
You can load large data sets using bulk loading methods instead. For more information, see inserting_data.
Omitting Columns
Omitting columns that have a default values (including default NULL
values) uses the default value, as shown in the following example:
test=> INSERT INTO cool_animals (id) VALUES (6);
executed
test=> INSERT INTO cool_animals (id) VALUES (6);
executed
test=> SELECT * FROM cool_animals;
1,Dog ,7
2,Possum ,3
3,Cat ,5
4,Elephant ,6500
5,Rhinoceros ,2100
6,\N,\N
6 rows
Note
Null row values are represented as \N
For more information on inserting rows, see INSERT.
For more information on default values, see default value.
Running Queries
The Running Queries section describes the following:
Running Basic Queries
You can run a basic query using the SELECT
keyword, followed by a list of columns and values to be returned, and the table to get the data from, as shown in the following example:
test=> SELECT id, name, weight FROM cool_animals;
1,Dog ,7
2,Possum ,3
3,Cat ,5
4,Elephant ,6500
5,Rhinoceros ,2100
6,\N,\N
6 rows
For more information on the SELECT
keyword, see SELECT.
To Output All Columns
You can output all columns without specifying them using the star operator *
, as shown in the following example:
test=> SELECT * FROM cool_animals;
1,Dog ,7
2,Possum ,3
3,Cat ,5
4,Elephant ,6500
5,Rhinoceros ,2100
6,\N,\N
6 rows
Outputting Shorthand Table Values
You can output the number of values in a table without getting the full result set by using the COUNT
statement:
test=> SELECT COUNT(*) FROM cool_animals;
6
1 row
Filtering Results
You can filter results by adding a WHERE
clause and specifying the filter condition, as shown in the following example:
test=> SELECT id, name, weight FROM cool_animals WHERE weight > 1000;
4,Elephant ,6500
5,Rhinoceros ,2100
2 rows
Sorting Results
You can sort results by adding an ORDER BY
clause and specifying ascending (ASC
) or descending (DESC
) order, as shown in the following example:
test=> SELECT * FROM cool_animals ORDER BY weight DESC;
4,Elephant ,6500
5,Rhinoceros ,2100
1,Dog ,7
3,Cat ,5
2,Possum ,3
6,\N,\N
6 rows
Filtering Null Rows
You can filter null rows by adding an IS NOT NULL
filter, as shown in the following example:
test=> SELECT * FROM cool_animals WHERE weight IS NOT NULL ORDER BY weight DESC;
4,Elephant ,6500
5,Rhinoceros ,2100
1,Dog ,7
3,Cat ,5
2,Possum ,3
5 rows
For more information, see the following:
Outputting the number of values in a table without getting the full result set - COUNT(*).
Filtering results - WHERE
Sorting results - ORDER BY
Filtering rows - IS NOT NULL
Deleting Rows
The Deleting Rows section describes the following:
Deleting Selected Rows
You can delete rows in a table selectively using the DELETE
command. You must include a table name and WHERE clause to specify the rows to delete, as shown in the following example:
test=> DELETE FROM cool_animals WHERE weight is null;
executed
master=> SELECT * FROM cool_animals;
1,Dog ,7
2,Possum ,3
3,Cat ,5
4,Elephant ,6500
5,Rhinoceros ,2100
5 rows
Deleting All Rows
You can delete all rows in a table using the TRUNCATE
command followed by the table name, as shown in the following example:
test=> TRUNCATE TABLE cool_animals;
executed
Note
While TRUNCATE deletes data from disk immediately, DELETE does not physically remove the deleted rows.
For more information, see the following:
Saving Query Results to a CSV or PSV File
You can save query results to a CSV or PSV file using the sqream sql
command from a CLI client. This saves your query results to the selected delimited file format, as shown in the following example:
$ sqream sql --username=mjordan --database=nba --host=localhost --port=5000 -c "SELECT * FROM nba LIMIT 5" --results-only --delimiter='|' > nba.psv
$ cat nba.psv
Avery Bradley |Boston Celtics |0|PG|25|6-2 |180|Texas |7730337
Jae Crowder |Boston Celtics |99|SF|25|6-6 |235|Marquette |6796117
John Holland |Boston Celtics |30|SG|27|6-5 |205|Boston University |\N
R.J. Hunter |Boston Celtics |28|SG|22|6-5 |185|Georgia State |1148640
Jonas Jerebko |Boston Celtics |8|PF|29|6-10|231|\N|5000000
For more output options, see Controlling the Client Output.
What’s next?
Explore all of SQream DB’s SQL Syntax.
See the full SQream SQL CLI reference.
Connect a third party tool to start analyzing data.
For more information on other basic SQream operations, see the following:
Hardware Guide
The Hardware Guide describes the SQreamDB reference architecture, emphasizing the benefits to the technical audience, and provides guidance for end-users on selecting the right configuration for a SQreamDB installation.
Need help?
This page is intended as a “reference” to suggested hardware. However, different workloads require different solution sizes. SQreamDB’s experienced customer support has the experience to advise on these matters to ensure the best experience.
Visit SQreamDB’s support portal for additional support.
Cluster Architectures
SQreamDB recommends rackmount servers by server manufacturers Dell, Lenovo, HP, Cisco, Supermicro, IBM, and others.
A typical SQreamDB cluster includes one or more nodes, consisting of:
Two-socket enterprise processors, such as Intel® Xeon® Gold processors or the IBM® POWER9 processors, providing the high performance required for compute-bound database workloads.
NVIDIA Tesla GPU accelerators, with up to 5,120 CUDA and Tensor cores, running on PCIe or fast NVLINK busses, delivering high core count, and high-throughput performance on massive datasets.
High density chassis design, offering between 2 and 4 GPUs in a 1U, 2U, or 3U package, for best-in-class performance per cm2.
Single-Node Cluster
A single-node SQreamDB cluster can handle between 1 and 8 concurrent users, with up to 1PB of data storage (when connected via NAS).
An average single-node cluster can be a rackmount server or workstation, containing the following components:
Component |
Type |
---|---|
Server |
Dell R750, Dell R940xa, HP ProLiant DL380 Gen10 or similar (Intel only) |
Processors |
2x Intel Xeon Gold 6348 (28C/56HT) 3.5GHz or similar |
RAM |
1.5 TB |
Onboard storage |
|
GPU |
NVIDIA 2x A100, H100, or L40S |
Operating System |
Red Hat Enterprise Linux v8.8 or Amazon Linux |
Note
If you are using internal storage, your volumes must be formatted as xfs.
In this system configuration, SQreamDB can store about 100TB of raw data (assuming an average compression ratio and ~30TB of usable raw storage).
If a NAS is used, the 10x SSD drives can be omitted, but SQreamDB recommends 2TB of local spool space on SSD or NVMe drives.
Multi-Node Cluster
Multi-node clusters can handle any number of concurrent users. A typical SQreamDB cluster relies on a minimum of two GPU-enabled servers and shared storage connected over a network fabric, such as InfiniBand EDR, 40GbE, or 100GbE.
The Multi-Node Cluster Examples section describes the following specifications:
The following table shows SQreamDB’s recommended hardware specifications:
Component |
Type |
---|---|
Server |
Dell R750, Dell R940xa, HP ProLiant DL380 Gen10 or similar (Intel only) |
Processors |
2x Intel Xeon Gold 6348 (28C/56HT) 3.5GHz or similar |
RAM |
2 TB |
Onboard storage |
|
Network Card (Storage) |
2x Mellanox ConnectX-6 Single Port HDR VPI InfiniBand Adapter cards at 100GbE or similar. |
Network Card (Client) |
2x 1 GbE cards or similar |
External Storage |
|
GPU |
NVIDIA 2x A100, H100, or L40S |
Operating System |
Red Hat Enterprise Linux v8.8 or Amazon Linux |
Metadata Server
The following table shows SQreamDB’s recommended metadata server specifications:
Component |
Type |
---|---|
Server |
Dell R750, Dell R940xa, HP ProLiant DL380 Gen10 or similar (Intel only) |
Processors |
2x Intel Xeon Gold 6342 2.8 Ghz 24C processors or similar |
RAM |
512GB DDR4 RAM 8x64GB RDIMM or similar |
Onboard storage |
2x 960 GB MVMe SSD drives in RAID 1 or similar |
Network Card (Storage) |
2x Mellanox ConnectX-6 Single Port HDR VPI InfiniBand Adapter cards at 100GbE or similar. |
Network Card (Client) |
2x 1 GbE cards or similar |
Operating System |
Red Hat Enterprise Linux v8.8 or Amazon Linux |
Note
With a NAS connected over GPFS, Lustre, Weka, or VAST, each SQreamDB worker can read data at 5GB/s or more.
SQreamDB Studio Server
The following table shows SQreamDB’s recommended Studio server specifications:
Component |
Type |
---|---|
Server |
Physical or virtual machine |
Processor |
1x Intel Core i7 |
RAM |
16 GB |
Onboard storage |
50 GB SSD 2.5in Hot-plug for OS, RAID1 |
Operating System |
Red Hat Enterprise Linux v7.9 or CentOS v7.9 |
Cluster Design Considerations
This section describes the following cluster design considerations:
In a SQreamDB installation, the storage and computing are logically separated. While they may reside on the same machine in a standalone installation, they may also reside on different hosts, providing additional flexibility and scalability.
SQreamDB uses all resources in a machine, including CPU, RAM, and GPU to deliver the best performance. At least 256GB of RAM per physical GPU is recommended.
Local disk space is required for good temporary spooling performance, particularly when performing intensive operations exceeding the available RAM, such as sorting. SQreamDB recommends an SSD or NVMe drive in RAID0 configuration with about twice the RAM size available for temporary storage. This can be shared with the operating system drive if necessary.
When using NAS devices, SQreamDB recommends approximately 5GB/s of burst throughput from storage per GPU.
Balancing Cost and Performance
Prior to designing and deploying a SQreamDB cluster, a number of important factors must be considered.
The Balancing Cost and Performance section provides a breakdown of deployment details to ensure that this installation exceeds or meets the stated requirements. The rationale provided includes the necessary information for modifying configurations to suit the customer use-case scenario, as shown in the following table:
Component |
Value |
---|---|
Compute - CPU |
Balance price and performance |
Compute – GPU |
Balance price with performance and concurrency |
Memory – GPU RAM |
Balance price with concurrency and performance. |
Memory - RAM |
Balance price and performance |
Operating System |
Availability, reliability, and familiarity |
Storage |
Balance price with capacity and performance |
Network |
Balance price and performance |
CPU Compute
SQreamDB relies on multi-core Intel Gold Xeon processors or IBM POWER9 processors and recommends a dual-socket machine populated with CPUs with 18C/36HT or better. While a higher core count may not necessarily affect query performance, more cores will enable higher concurrency and better load performance.
GPU Compute and RAM
The NVIDIA Tesla range of high-throughput GPU accelerators provides the best performance for enterprise environments. Most cards have ECC memory, which is crucial for delivering correct results every time. SQreamDB recommends the NVIDIA Tesla A100 80GB GPU for the best performance and highest concurrent user support.
GPU RAM, sometimes called GRAM or VRAM, is used for processing queries. It is possible to select GPUs with less RAM. However, the smaller GPU RAM results in reduced concurrency, as the GPU RAM is used extensively in operations like JOINs, ORDER BY, GROUP BY, and all SQL transforms.
RAM
SQreamDB requires using Error-Correcting Code memory (ECC), standard on most enterprise servers. Large amounts of memory are required for improved performance for heavy external operations, such as sorting and joining.
SQreamDB recommends at least 256GB of RAM per GPU on your machine.
Operating System
SQreamDB can run on the following 64-bit Linux operating systems:
Red Hat Enterprise Linux (RHEL) v7.9
CentOS v7.9
Amazon Linux 2018.03
Storage
For clustered scale-out installations, SQreamDB relies on NAS storage. For stand-alone installations, SQreamDB relies on redundant disk configurations, such as RAID 5, 6, or 10. These RAID configurations replicate blocks of data between disks to avoid data loss or system unavailability.
SQreamDB recommends using enterprise-grade SAS SSD or NVMe drives. For a 32-user configuration, the number of GPUs should roughly match the number of users. SQreamDB recommends 1 Tesla A100 / H100 or L40S GPU per 2 users, for full, uninterrupted dedicated access.
Installation Guides
Before you get started using SQream, consider your business needs and available resources. SQream was designed to run in a number of environments, and to be installed using different methods depending on your requirements. This determines which installation method to use.
The Installation Guides section describes the following installation guide sets:
Installing and Launching SQreamDB
The Installing and Launching SQream page includes the following installation guides:
Pre-Installation Configuration
Before installing SQreamDB, it is essential that you tune your system for better performance and stability.
BIOS Settings
The first step when setting your pre-installation configurations is to use the BIOS settings.
The BIOS settings may have a variety of names, or may not exist on your system. Each system vendor has a different set of settings and variables. It is safe to skip any and all of the configuration steps, but this may impact performance.
If any doubt arises, consult the documentation for your server or your hardware vendor for the correct way to apply the settings.
Item |
Setting |
Rationale |
---|---|---|
Management console access |
Connected |
Connection to OOB required to preserve continuous network uptime. |
All drives |
Connected and displayed on RAID interface |
Prerequisite for cluster or OS installation. |
RAID volumes. |
Configured according to project guidelines. Must be rebooted to take effect. |
Clustered to increase logical volume and provide redundancy. |
Fan speed Thermal Configuration. |
Dell fan speed: High Maximum. Specified minimum setting: 60. HPe thermal configuration: Increased cooling. |
NVIDIA Tesla GPUs are passively cooled and require high airflow to operate at full performance. |
Power regulator or iDRAC power unit policy |
HPe: HP static high performance mode enabled. Dell: iDRAC power unit policy (power cap policy) disabled. |
Other power profiles (such as “balanced”) throttle the CPU and diminishes performance. Throttling may also cause GPU failure. |
System Profile, Power Profile, or Performance Profile |
High Performance |
The Performance profile provides potentially increased performance by maximizing processor frequency, and the disabling certain power saving features such as C-states. Use this setting for environments that are not sensitive to power consumption. |
Power Cap Policy or Dynamic power capping |
Disabled |
Other power profiles (like “balanced”) throttle the CPU and may diminish performance or cause GPU failure. This setting may appear together with the above (Power profile or Power regulator). This setting allows disabling system ROM power calibration during the boot process. Power regulator settings are named differently in BIOS and iLO/iDRAC. |
Intel Turbo Boost |
Enabled |
Intel Turbo Boost enables overclocking the processor to boost CPU-bound operation performance. Overclocking may risk computational jitter due to changes in the processor’s turbo frequency. This causes brief pauses in processor operation, introducing uncertainty into application processing time. Turbo operation is a function of power consumption, processor temperature, and the number of active cores. |
Intel Virtualization Technology (VT-d) |
Disable |
VT-d is optimal for running VMs. However, when running Linux natively, disabling VT-d boosts performance by up to 10%. |
Logical Processor |
HPe: Enable Hyperthreading Dell: Enable Logical Processor |
Hyperthreading doubles the amount of logical processors, which may improve performance by ~5-10% for CPU-bound operations. |
Intel Virtualization Technology (VT-d) |
Disable |
VT-d is optimal for running VMs. However, when running Linux natively, disabling VT-d boosts performance by up to 10%. |
Processor C-States (Minimum processor idle power core state) |
Disable |
Processor C-States reduce server power when the system is in an idle state. This causes slower cold-starts when the system transitions from an idle to a load state, and may reduce query performance by up to 15%. |
HPe: Energy/Performance bias |
Maximum performance |
Configures processor sub-systems for high-performance and low-latency. Other power profiles (like “balanced”) throttle the CPU and may diminish performance. Use this setting for environments that are not sensitive to power consumption. |
HPe: DIMM voltage |
Optimized for Performance |
Setting a higher voltage for DIMMs may increase performance. |
Memory Operating Mode |
Optimizer Mode, Disable Node Interleaving, Auto Memory Operating Voltage |
Memory Operating Mode is tuned for performance in Optimizer mode. Other modes may improve reliability, but reduce performance. Node Interleaving should be disabled because enabling it interleaves the memory between memory nodes, which harms NUMA-aware applications such as SQream DB. |
HPe: Memory power savings mode |
Maximum performance |
This setting configures several memory parameters to optimize the performance of memory sub-systems. The default setting is Balanced. |
HPe ACPI SLIT |
Enabled |
ACPI SLIT sets the relative access times between processors and memory and I/O sub-systems. ACPI SLIT enables operating systems to use this data to improve performance by more efficiently allocating resources and workloads. |
QPI Snoop |
Cluster on Die or Home Snoop |
QPI (QuickPath Interconnect) Snoop lets you configure different Snoop modes that impact the QPI interconnect. Changing this setting may improve the performance of certain workloads. The default setting of Home Snoop provides high memory bandwidth in an average NUMA environment. Cluster on Die may provide increased memory bandwidth in highly optimized NUMA workloads. Early Snoop may decrease memory latency, but may result in lower overall bandwidth compared to other modes. |
Installing the Operating System
Once the BIOS settings have been set, you must install the operating system. Either the CentOS (versions 7.6-7.9) or RHEL (versions 7.6-7.9) must be installed before installing the SQream database, by either the customer or a SQream representative.
To install the operating system:
Select a language (English recommended).
From Software Selection, select Minimal.
Select the Development Tools group checkbox.
Continue the installation.
Set up the necessary drives and users as per the installation process.
Using Debugging Tools is recommended for future problem-solving if necessary.
Selecting the Development Tools group installs the following tools:
autoconf
automake
binutils
bison
flex
gcc
gcc-c++
gettext
libtool
make
patch
pkgconfig
redhat-rpm-config
rpm-build
rpm-sign
The root user is created and the OS shell is booted up.
Configuring the Operating System
Once you’ve installted your operation system, you can configure it. When configuring the operating system, several basic settings related to creating a new server are required. Configuring these as part of your basic set-up increases your server’s security and usability.
Logging In to the Server
You can log in to the server using the server’s IP address and password for the root user. The server’s IP address and root user were created while installing the operating system above.
Automatically Creating a SQream User
To automatically create a SQream user:
If a SQream user was created during installation, verify that the same ID is used on every server:
$ sudo id sqream
The ID 1000 is used on each server in the following example:
$ uid=1000(sqream) gid=1000(sqream) groups=1000(sqream)
If the ID’s are different, delete the SQream user and SQream group from both servers:
$ sudo userdel sqream
Recreate it using the same ID:
$ sudo rm /var/spool/mail/sqream
Manually Creating a SQream User
To manually create a SQream user:
SQream enables you to manually create users. This section shows you how to manually create a user with the UID 1111. You cannot manually create during the operating system installation procedure.
Add a user with an identical UID on all cluster nodes:
$ useradd -u 1111 sqream
Add the user sqream to the wheel group.
$ sudo usermod -aG wheel sqream
You can remove the SQream user from the wheel group when the installation and configuration are complete:
$ passwd sqream
Log out and log back in as sqream.
Note
If you deleted the sqream user and recreated it with different ID, to avoid permission errors, you must change its ownership to /home/sqream.
Change the sqream user’s ownership to /home/sqream:
$ sudo chown -R sqream:sqream /home/sqream
Setting Up A Locale
SQream enables you to set up a locale. In this example, the locale used is your own location.
To set up a locale:
Set the language of the locale:
$ sudo localectl set-locale LANG=en_US.UTF-8
Set the time stamp (time and date) of the locale:
$ sudo timedatectl set-timezone Asia/Jerusalem
If needed, you can run the timedatectl list-timezones command to see your current time-zone.
Installing the Required Packages
You can install the required packages by running the following command:
$ sudo yum install ntp pciutils monit zlib-devel openssl-devel kernel-devel-$(uname -r) kernel-headers-$(uname -r) gcc net-tools wget jq
Installing the Recommended Tools
You can install the recommended tools by running the following command:
$ sudo yum install bash-completion.noarch vim-enhanced vim-common net-tools iotop htop psmisc screen xfsprogs wget yum-utils deltarpm dos2unix
Installing Python 3.6.7
Download the Python 3.6.7 source code tarball file from the following URL into the /home/sqream directory:
$ wget https://www.python.org/ftp/python/3.6.7/Python-3.6.7.tar.xz
Extract the Python 3.6.7 source code into your current directory:
$ tar -xf Python-3.6.7.tar.xz
Navigate to the Python 3.6.7 directory:
$ cd Python-3.6.7
Run the ./configure script:
$ ./configure
Build the software:
$ make -j30
Install the software:
$ sudo make install
Verify that Python 3.6.7 has been installed:
$ python3
Installing NodeJS on CentOS
To install the node.js on CentOS:
Download the setup_12.x file as a root user logged in shell:
$ curl -sL https://rpm.nodesource.com/setup_12.x | sudo bash -
Clear the YUM cache and update the local metadata:
$ sudo yum clean all && sudo yum makecache fast
Install the node.js file:
$ sudo yum install -y nodejs
Install npm and make it available for all users:
$ sudo npm install pm2 -g
Installing NodeJS on Ubuntu
To install the node.js file on Ubuntu:
Download the setup_12.x file as a root user logged in shell:
$ curl -sL https://rpm.nodesource.com/setup_12.x | sudo bash -
Install the node.js file:
$ sudo apt-get install -y nodejs
Install npm and make it available for all users:
$ sudo npm install pm2 -g
Installing NodeJS Offline
To install NodeJS Offline
Download the NodeJS source code tarball file from the following URL into the /home/sqream directory:
$ wget https://nodejs.org/dist/v12.13.0/node-v12.13.0-linux-x64.tar.xz
Move the node-v12.13.0-linux-x64 file to the /usr/local directory.
$ sudo mv node-v12.13.0-linux-x64 /usr/local
Navigate to the /usr/bin/ directory:
$ cd /usr/bin
Create a symbolic link to the /local/node-v12.13.0-linux-x64/bin/node node directory:
$ sudo ln -s ../local/node-v12.13.0-linux-x64/bin/node node
Create a symbolic link to the /local/node-v12.13.0-linux-x64/bin/npm npm directory:
$ sudo ln -s ../local/node-v12.13.0-linux-x64/bin/npm npm
Create a symbolic link to the /local/node-v12.13.0-linux-x64/bin/npx npx directory:
$ sudo ln -s ../local/node-v12.13.0-linux-x64/bin/npx npx
Verify that the node versions for the above are correct:
$ node --version
Installing the pm2 Service Offline
To install the pm2 Service Offline
On a machine with internet access, install the following:
nodejs
npm
pm2
Extract the pm2 module to the correct directory:
$ cd /usr/local/node-v12.13.0-linux-x64/lib/node_modules $ tar -czvf pm2_x86.tar.gz pm2
Copy the pm2_x86.tar.gz file to a server without access to the internet and extract it.
Move the pm2 folder to the /usr/local/node-v12.13.0-linux-x64/lib/node_modules directory:
$ sudo mv pm2 /usr/local/node-v12.13.0-linux-x64/lib/node_modules
Navigate back to the /usr/bin directory:
$ cd /usr/bin again
Create a symbolink to the pm2 service:
$ sudo ln -s /usr/local/node-v12.22.3-linux-x64/lib/node_modules/pm2/bin/pm2 pm2
Verify that installation was successful:
$ pm2 list
Note
This must be done as a sqream user, and not as a sudo user.
Verify that the node version is correct:
$ node -v
Configuring the Network Time Protocol
This section describes how to configure your Network Time Protocol (NTP).
If you don’t have internet access, see Configure NTP Client to Synchronize with NTP Server.
To configure your NTP:
Install the NTP file.
$ sudo yum install ntp
Enable the ntpd program.
$ sudo systemctl enable ntpd
Start the ntdp program.
$ sudo systemctl start ntpd
Print a list of peers known to the server and a summary of their states.
$ sudo ntpq -p
Configuring the Network Time Protocol Server
If your organization has an NTP server, you can configure it.
To configure your NTP server:
Output your NTP server address and append
/etc/ntpd.conf
to the outuput.$ echo -e "\nserver <your NTP server address>\n" | sudo tee -a /etc/ntp.conf
Restart the service.
$ sudo systemctl restart ntpd
Check that synchronization is enabled:
$ sudo timedatectl
Checking that synchronization is enabled generates the following output:
$ Local time: Sat 2019-10-12 17:26:13 EDT Universal time: Sat 2019-10-12 21:26:13 UTC RTC time: Sat 2019-10-12 21:26:13 Time zone: America/New_York (EDT, -0400) NTP enabled: yes NTP synchronized: yes RTC in local TZ: no DST active: yes Last DST change: DST began at Sun 2019-03-10 01:59:59 EST Sun 2019-03-10 03:00:00 EDT Next DST change: DST ends (the clock jumps one hour backwards) at Sun 2019-11-03 01:59:59 EDT Sun 2019-11-03 01:00:00 EST
Configuring the Server to Boot Without the UI
You can configure your server to boot without a UI in cases when it is not required (recommended) by running the following command:
$ sudo systemctl set-default multi-user.target
Running this command activates the NO-UI server mode.
Configuring the Security Limits
The security limits refers to the number of open files, processes, etc.
You can configure the security limits by running the echo -e command as a root user logged in shell:
$ sudo bash
$ echo -e "sqream soft nproc 1000000\nsqream hard nproc 1000000\nsqream soft nofile 1000000\nsqream hard nofile 1000000\nsqream soft core unlimited\nsqream hard core unlimited" >> /etc/security/limits.conf
Configuring the Kernel Parameters
To configure the kernel parameters:
Insert a new line after each kernel parameter:
$ echo -e "vm.dirty_background_ratio = 5 \n vm.dirty_ratio = 10 \n vm.swappiness = 10 \n vm.vfs_cache_pressure = 200 \n vm.zone_reclaim_mode = 0 \n" >> /etc/sysctl.conf
Note
In the past, the vm.zone_reclaim_mode parameter was set to 7. In the latest Sqream version, the vm.zone_reclaim_mode parameter must be set to 0. If it is not set to 0, when a numa node runs out of memory, the system will get stuck and will be unable to pull memory from other numa nodes.
Check the maximum value of the fs.file.
$ sysctl -n fs.file-max
If the maximum value of the fs.file is smaller than 2097152, run the following command:
$ echo "fs.file-max=2097152" >> /etc/sysctl.conf
IP4 forward must be enabled for Docker and K8s installation only.
Run the following command:
$ sudo echo “net.ipv4.ip_forward = 1” >> /etc/sysctl.conf
Reboot your system:
$ sudo reboot
Configuring the Firewall
The example in this section shows the open ports for four sqreamd sessions. If more than four are required, open the required ports as needed. Port 8080 in the example below is a new UI port.
To configure the firewall:
Start the service and enable FirewallID on boot:
$ systemctl start firewalld
Add the following ports to the permanent firewall:
$ firewall-cmd --zone=public --permanent --add-port=8080/tcp $ firewall-cmd --zone=public --permanent --add-port=3105/tcp $ firewall-cmd --zone=public --permanent --add-port=3108/tcp $ firewall-cmd --zone=public --permanent --add-port=5000-5003/tcp $ firewall-cmd --zone=public --permanent --add-port=5100-5103/tcp $ firewall-cmd --permanent --list-all
Reload the firewall:
$ firewall-cmd --reload
Enable FirewallID on boot:
$ systemctl enable firewalld
If you do not need the firewall, you can disable it:
$ sudo systemctl disable firewalld
Disabling selinux
To disable selinux:
Show the status of selinux:
$ sudo sestatus
If the output is not disabled, edit the /etc/selinux/config file:
$ sudo vim /etc/selinux/config
Change SELINUX=enforcing to SELINUX=disabled.
The above changes will only take effect after rebooting the server.
You can disable selinux immediately after rebooting the server by running the following command:
$ sudo setenforce 0
Configuring the /etc/hosts File
To configure the /etc/hosts file:
Edit the /etc/hosts file:
$ sudo vim /etc/hosts
Call your local host:
$ 127.0.0.1 localhost $ <server1 ip> <server_name> $ <server2 ip> <server_name>
Configuring the DNS
To configure the DNS:
Run the ifconfig commasnd to check your NIC name. In the following example, eth0 is the NIC name:
$ sudo vim /etc/sysconfig/network-scripts/ifcfg-eth0
Replace the DNS lines from the example above with your own DNS addresses :
$ DNS1="4.4.4.4" $ DNS2="8.8.8.8"
Installing the Nvidia CUDA Driver
After configuring your operating system, you must install the Nvidia CUDA driver.
Warning
If your UI runs on the server, the server must be stopped before installing the CUDA drivers.
CUDA Driver Prerequisites
Verify that the NVIDIA card has been installed and is detected by the system:
$ lspci | grep -i nvidia
Check which version of gcc has been installed:
$ gcc --version
If gcc has not been installed, install it for one of the following operating systems:
On RHEL/CentOS:
$ sudo yum install -y gcc
On Ubuntu:
$ sudo apt-get install gcc
Updating the Kernel Headers
To update the kernel headers:
Update the kernel headers on one of the following operating systems:
On RHEL/CentOS:
$ sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
On Ubuntu:
$ sudo apt-get install linux-headers-$(uname -r)
Install wget one of the following operating systems:
On RHEL/CentOS:
$ sudo yum install wget
On Ubuntu:
$ sudo apt-get install wget
Disabling Nouveau
You can disable Nouveau, which is the default driver.
To disable Nouveau:
Check if the Nouveau driver has been loaded:
$ lsmod | grep nouveau
If the Nouveau driver has been loaded, the command above generates output.
Blacklist the Nouveau drivers to disable them:
$ cat <<EOF | sudo tee /etc/modprobe.d/blacklist-nouveau.conf blacklist nouveau options nouveau modeset=0 EOF
Regenerate the kernel initramfs directory set:
Modify the initramfs directory set:
$ sudo dracut --forceReboot the server:
$ sudo reboot
Installing the CUDA Driver
This section describes how to install the CUDA driver.
Note
The version of the driver installed on the customer’s server must be equal or higher than the driver included in the Sqream release package. Contact a Sqream customer service representative to identify the correct version to install.
The Installing the CUDA Driver section describes the following:
Installing the CUDA Driver from the Repository
Installing the CUDA driver from the Repository is the recommended installation method.
Warning
For A100 GPU and other A series GPUs, you must install the cuda 11.4.3 driver. The version of the driver installed on the customer server must be equal to or higher than the one used to build the SQream package. For questions related to which driver to install, contact SQream Customer Support.
To install the CUDA driver from the Repository:
Install the CUDA dependencies for one of the following operating systems:
For RHEL:
$ sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
For CentOS:
$ sudo yum install epel-release
Install the CUDA dependencies from the epel repository:
$ sudo yum install dkms libvdpau
Installing the CUDA depedendencies from the epel repository is only required for installing runfile.
Download and install the required local repository:
Intel - CUDA 10.1 for RHEL7:
$ wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-rhel7-10-1-local-10.1.243-418.87.00-1.0-1.x86_64.rpm $ sudo yum localinstall cuda-repo-rhel7-10-1-local-10.1.243-418.87.00-1.0-1.x86_64.rpm
Intel - 11.4.3 repository:
$ wget https://developer.download.nvidia.com/compute/cuda/11.4.3/local_installers/cuda-repo-rhel7-11-4-local-11.4.3_470.82.01-1.x86_64.rpm $ sudo yum localinstall cuda-repo-rhel7-11-4-local-11.4.3_470.82.01-1.x86_64.rpm
IBM Power9 - CUDA 10.1 for RHEL7:
$ wget https://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-rhel7-10-1-local-10.1.243-418.87.00-1.0-1.ppc64le.rpm $ sudo yum localinstall cuda-repo-rhel7-10-1-local-10.1.243-418.87.00-1.0-1.ppc64le.rpm
Warning
For Power9 with V100 GPUs, you must install the CUDA 10.1 driver.
Install the CUDA drivers:
Clear the YUM cache:
$ sudo yum clean all
Install the most current DKMS (Dynamic Kernel Module Support) NVIDIA driver:
$ sudo yum -y install nvidia-driver-latest-dkms
Verify that the installation was successful:
$ nvidia-smi
Note
If you do not have access to internet, you can set up a local repository offline.
You can prepare the CUDA driver offline from a server connected to the CUDA repo by running the following commands as a root user:
Query all the packages installed in your system, and verify that cuda-repo has been installed:
$ rpm -qa |grep cuda-repo
Navigate to the correct repository:
$ cd /etc/yum.repos.d/
List in long format and print lines matching a pattern for the cuda file:
$ ls -l |grep cuda
The following is an example of the correct output:
$ cuda-10-1-local.repo
Edit the /etc/yum.repos.d/cuda-10-1-local.repo file:
$ vim /etc/yum.repos.d/cuda-10-1-local.repo
The following is an example of the correct output:
$ name=cuda-10-1-local
Clone the repository to a location where it can be copied from:
$ reposync -g -l -m --repoid=cuda-10-1-local --download_path=/var/cuda-repo-10.1-local
Copy the repository to the installation server and create the repository:
$ createrepo -g comps.xml /var/cuda-repo-10.1-local
Add a repo configuration file in /etc/yum.repos.d/ by editing the /etc/yum.repos.d/cuda-10.1-local.repo repository:
$ [cuda-10.1-local] $ name=cuda-10.1-local $ baseurl=file:///var/cuda-repo-10.1-local $ enabled=1 $ gpgcheck=1 $ gpgkey=file:///var/cuda-repo-10-1-local/7fa2af80.pub
Install the CUDA drivers by installing the most current DKMS (Dynamic Kernel Module Support) NVIDIA driver as a root user logged in shell:
$ sudo yum -y install nvidia-driver-latest-dkms
Tuning Up NVIDIA Performance
This section describes how to tune up NVIDIA performance.
Note
The procedures in this section are relevant to Intel only.
To Tune Up NVIDIA Performance when Driver Installed from the Repository
To tune up NVIDIA performance when the driver was installed from the repository:
Check the service status:
$ sudo systemctl status nvidia-persistenced
If the service exists, it will be stopped be default.
Start the service:
$ sudo systemctl start nvidia-persistenced
Verify that no errors have occurred:
$ sudo systemctl status nvidia-persistenced
Enable the service to start up on boot:
$ sudo systemctl enable nvidia-persistenced
For V100/A100, add the following lines:
$ nvidia-persistenced
Note
The following are mandatory for IBM:
$ sudo systemctl start nvidia-persistenced $ sudo systemctl enable nvidia-persistenced
Reboot the server and run the NVIDIA System Management Interface (NVIDIA SMI):
$ nvidia-smi
Note
Setting up the NVIDIA POWER9 CUDA driver includes additional set-up requirements. The NVIDIA POWER9 CUDA driver will not function properly if the additional set-up requirements are not followed. See POWER9 Setup for the additional set-up requirements.
To Tune Up NVIDIA Performance when Driver Installed from the Runfile
To tune up NVIDIA performance when the driver was installed from the runfile:
Change the permissions on the rc.local file to executable:
$ sudo chmod +x /etc/rc.local
Edit the /etc/yum.repos.d/cuda-10-1-local.repo file:
$ sudo vim /etc/rc.local
Add the following lines:
For V100/A100:
$ nvidia-persistenced
For IBM (mandatory):
$ sudo systemctl start nvidia-persistenced $ sudo systemctl enable nvidia-persistenced
For K80:
$ nvidia-persistenced $ nvidia-smi -pm 1 $ nvidia-smi -acp 0 $ nvidia-smi --auto-boost-permission=0 $ nvidia-smi --auto-boost-default=0
Reboot the server and run the NVIDIA System Management Interface (NVIDIA SMI):
$ nvidia-smi
Note
Setting up the NVIDIA POWER9 CUDA driver includes additional set-up requirements. The NVIDIA POWER9 CUDA driver will not function properly if the additional set-up requirements are not followed. See POWER9 Setup for the additional set-up requirements.
Disabling Automatic Bug Reporting Tools
To disable automatic bug reporting tools:
Run the following abort commands:
$ for i in abrt-ccpp.service abrtd.service abrt-oops.service abrt-pstoreoops.service abrt-vmcore.service abrt-xorg.service ; do sudo systemctl disable $i; sudo systemctl stop $i; done
The server is ready for the SQream software installation.
Run the following checks:
Check the OS release:
$ cat /etc/os-release
Verify that a SQream user exists and has the same ID on all cluster member services:
$ id sqream
Verify that the storage is mounted:
$ mount
Verify that the driver has been installed correctly:
$ nvidia-smi
Check the maximum value of the fs.file:
$ sysctl -n fs.file-max
Run the following command as a SQream user:
$ ulimit -c -u -n
The following shows the desired output:
$ core file size (blocks, -c) unlimited $ max user processes (-u) 1000000 $ open files (-n) 1000000
Enabling Core Dumps
After installing the Nvidia CUDA driver, you can enable your core dumps. While SQream recommends enabling your core dumps, it is optional.
The Enabling Core Dumps section describes the following:
Checking the abrtd Status
To check the abrtd status:
Check if abrtd is running:
$ sudo ps -ef |grep abrt
If abrtd is running, stop it:
$ sudo service abrtd stop $ sudo chkconfig abrt-ccpp off $ sudo chkconfig abrt-oops off $ sudo chkconfig abrt-vmcore off $ sudo chkconfig abrt-xorg off $ sudo chkconfig abrtd off
Setting the Limits
To set the limits:
Set the limits:
$ ulimit -c
If the output is 0, add the following lines to the limits.conf file (/etc/security):
$ * soft core unlimited $ * hard core unlimited
Log out and log in to apply the limit changes.
Creating the Core Dumps Directory
To set the core dumps directory:
Make the /tmp/core_dumps directory:
$ mkdir /tmp/core_dumps
Set the ownership of the /tmp/core_dumps directory:
$ sudo chown sqream.sqream /tmp/core_dumps
Grant read, write, and execute permissions to all users:
$ sudo chmod -R 777 /tmp/core_dumps
Warning
Because the core dump file may be the size of total RAM on the server, verify that you have sufficient disk space. In the example above, the core dump is configured to the /tmp/core_dumps directory. You must replace path according to your own environment and disk space.
Setting the Output Directory of the /etc/sysctl.conf File
To set the output directory of the /etc/sysctl.conf file:
Edit the /etc/sysctl.conf file:
$ sudo vim /etc/sysctl.conf
Add the following to the bottom of the file:
$ kernel.core_uses_pid = 1 $ kernel.core_pattern = /<tmp/core_dumps>/core-%e-%s-%u-%g-%p-%t $ fs.suid_dumpable = 2
To apply the changes without rebooting the server, run the following:
$ sudo sysctl -p
Check that the core output directory points to the following:
$ sudo cat /proc/sys/kernel/core_patternThe following shows the correct generated output:
$ /tmp/core_dumps/core-%e-%s-%u-%g-%p-%t
Verify that the core dumping works:
$ select abort_server();
Verifying that the Core Dumps Work
You can verify that the core dumps work only after installing and running SQream. This causes the server to crash and a new core.xxx file to be included in the folder that is written in /etc/sysctl.conf
To verify that the core dumps work:
Stop and restart all SQream services.
Connect to SQream with ClientCmd and run the following command:
$ select abort_server();
Troubleshooting Core Dumping
This section describes the troubleshooting procedure to be followed if all parameters have been configured correctly, but the cores have not been created.
To troubleshoot core dumping:
Reboot the server.
Verify that you have folder permissions:
$ sudo chmod -R 777 /tmp/core_dumps
Verify that the limits have been set correctly:
$ ulimit -c
If all parameters have been configured correctly, the correct output is:
$ core file size (blocks, -c) unlimited $ open files (-n) 1000000
If all parameters have been configured correctly, but running ulimit -c outputs 0, run the following:
$ sudo vim /etc/profile
Search for line and tag it with the hash symbol:
$ ulimit -S -c 0 > /dev/null 2>&1
Log out and log in.
Run the ulimit -c command:
$ ulimit -c command
If the line is not found in /etc/profile directory, do the following:
Run the following command:
$ sudo vim /etc/init.d/functions
Search for the following:
$ ulimit -S -c ${DAEMON_COREFILE_LIMIT:-0} >/dev/null 2>&1
If the line is found, tag it with the hash symbol and reboot the server.
Installing SQream Using Binary Packages
This procedure describes how to install SQream using Binary packages and must be done on all servers.
To install SQream using Binary packages:
Copy the SQream package to the /home/sqream directory for the current version:
$ tar -xf sqream-db-v<2020.2>.tar.gz
Append the version number to the name of the SQream folder. The version number in the following example is v2020.2:
$ mv sqream sqream-db-v<2020.2>
Move the new version of the SQream folder to the /usr/local/ directory:
$ sudo mv sqream-db-v<2020.2> /usr/local/
Change the ownership of the folder to sqream folder:
$ sudo chown -R sqream:sqream /usr/local/sqream-db-v<2020.2>
Navigate to the /usr/local/ directory and create a symbolic link to SQream:
$ cd /usr/local $ sudo ln -s sqream-db-v<2020.2> sqream
Verify that the symbolic link that you created points to the folder that you created:
$ ls -l
Verify that the symbolic link that you created points to the folder that you created:
$ sqream -> sqream-db-v<2020.2>
Create the SQream configuration file destination folders and set their ownership to sqream:
$ sudo mkdir /etc/sqream $ sudo chown -R sqream:sqream /etc/sqream
Create the SQream service log destination folders and set their ownership to sqream:
$ sudo mkdir /var/log/sqream $ sudo chown -R sqream:sqream /var/log/sqream
Navigate to the /usr/local/ directory and copy the SQream configuration files from them:
$ cd /usr/local/sqream/etc/ $ cp * /etc/sqream
The configuration files are service configuration files, and the JSON files are SQream configuration files, for a total of four files. The number of SQream configuration files and JSON files must be identical.
Note
Verify that the JSON files have been configured correctly and that all required flags have been set to the correct values.
In each JSON file, the following parameters must be updated:
instanceId
machineIP
metadataServerIp
spoolMemoryGB
limitQueryMemoryGB
gpu
port
ssl_port
See how to configure the Spool Memory and Limit Query Memory.
Note the following:
The value of the metadataServerIp parameter must point to the IP that the metadata is running on.
The value of the machineIP parameter must point to the IP of your local machine.
It would be same on server running metadataserver and different on other server nodes.
Optional - To run additional SQream services, copy the required configuration files and create additional JSON files:
$ cp sqream2_config.json sqream3_config.json $ vim sqream3_config.json
Note
A unique instanceID must be used in each JSON file. IN the example above, the instanceID sqream_2 is changed to sqream_3.
Optional - If you created additional services in Step 11, verify that you have also created their additional configuration files:
$ cp sqream2-service.conf sqream3-service.conf $ vim sqream3-service.conf
For each SQream service configuration file, do the following:
Change the SERVICE_NAME=sqream2 value to SERVICE_NAME=sqream3.
Change LOGFILE=/var/log/sqream/sqream2.log to LOGFILE=/var/log/sqream/sqream3.log.
Note
If you are running SQream on more than one server, you must configure the serverpicker
and metadatserver
services to start on only one of the servers. If metadataserver is running on the first server, the metadataServerIP
value in the second server’s /etc/sqream/sqream1_config.json file must point to the IP of the server on which the metadataserver
service is running.
Set up servicepicker:
Do the following:
$ vim /etc/sqream/server_picker.conf
Change the IP 127.0.0.1 to the IP of the server that the metadataserver service is running on.
Change the CLUSTER to the value of the cluster path.
Set up your service files:
$ cd /usr/local/sqream/service/ $ cp sqream2.service sqream3.service $ vim sqream3.service
Increment each EnvironmentFile=/etc/sqream/sqream2-service.conf configuration file for each SQream service file, as shown below:
$ EnvironmentFile=/etc/sqream/sqream<3>-service.conf
Copy and register your service files into systemd:
$ sudo cp metadataserver.service /usr/lib/systemd/system/ $ sudo cp serverpicker.service /usr/lib/systemd/system/ $ sudo cp sqream*.service /usr/lib/systemd/system/
Verify that your service files have been copied into systemd:
$ ls -l /usr/lib/systemd/system/sqream* $ ls -l /usr/lib/systemd/system/metadataserver.service $ ls -l /usr/lib/systemd/system/serverpicker.service $ sudo systemctl daemon-reload
Copy the license into the /etc/license directory:
$ cp license.enc /etc/sqream/
If you have an HDFS environment, see Configuring an HDFS Environment for the User sqream.
Installing SQream with Kubernetes
Kubernetes, also known as k8s, is a portable open source platform that automates Linux container operations. Kubernetes supports outsourcing data centers to public cloud service providers or can be scaled for web hosting. SQream uses Kubernetes as an orchestration and recovery solution.
The Installing SQream with Kubernetes guide describes the following:
Preparing the SQream Environment to Launch SQream Using Kubernetes
The Preparing the SQream environment to Launch SQream Using Kubernetes section describes the following:
Overview
A minimum of three servers is required for preparing the SQream environment using Kubernetes.
Kubernetes uses clusters, which are sets of nodes running containterized applications. A cluster consists of at least two GPU nodes and one additional server without GPU to act as the quorum manager.
Each server must have the following IP addresses:
An IP address located in the management network.
An additional IP address from the same subnet to function as a floating IP.
All servers must be mounted in the same shared storage folder.
The following list shows the server host name format requirements:
A maximum of 253 characters.
Only lowercase alphanumeric characters, such as
-
or.
.Starts and ends with alphanumeric characters.
Go back to Preparing the SQream Environment to Launch SQream Using Kubernetes
Operating System Requirements
The required operating system is a version of x86 CentOS/RHEL between 7.6 and 7.9. Regarding PPC64le, the required version is RHEL 7.6.
Go back to Preparing the SQream Environment to Launch SQream Using Kubernetes
Compute Server Specifications
Installing SQream with Kubernetes includes the following compute server specifications:
CPU: 4 cores
RAM: 16GB
HD: 500GB
Go back to Preparing the SQream Environment to Launch SQream Using Kubernetes
Setting Up Your Hosts
SQream requires you to set up your hosts. Setting up your hosts requires the following:
Configuring the Hosts File
To configure the /etc/hosts file:
Edit the /etc/hosts file:
$ sudo vim /etc/hosts
Call your local host:
$ 127.0.0.1 localhost $ <server ip> <server_name>
Installing the Required Packages
The first step in setting up your hosts is to install the required packages.
To install the required packages:
Run the following command based on your operating system:
RHEL:
$ sudo yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
CentOS:
$ sudo yum install epel-release $ sudo yum install pciutils openssl-devel python36 python36-pip kernel-devel-$(uname -r) kernel-headers-$(uname -r) gcc jq net-tools ntp
Verify that that the required packages were successfully installed. The following is the correct output:
ntpq --version jq --version python3 --version pip3 --version rpm -qa |grep kernel-devel-$(uname -r) rpm -qa |grep kernel-headers-$(uname -r) gcc --version
Enable the ntpd (Network Time Protocol daemon) program on all servers:
$ sudo systemctl start ntpd $ sudo systemctl enable ntpd $ sudo systemctl status ntpd $ sudo ntpq -p
Go back to Setting Up Your Hosts
Disabling the Linux UI
After installing the required packages, you must disable the Linux UI if it has been installed.
You can disable Linux by running the following command:
$ sudo systemctl set-default multi-user.target
Go back to Setting Up Your Hosts
Disabling SELinux
After disabling the Linux UI you must disable SELinux.
To disable SELinux:
Run the following command:
$ sed -i -e s/enforcing/disabled/g /etc/selinux/config $ sudo reboot
Reboot the system as a root user:
$ sudo reboot
Go back to Setting Up Your Hosts
Disabling Your Firewall
After disabling SELinux, you must disable your firewall by running the following commands:
$ sudo systemctl stop firewalld $ sudo systemctl disable firewalld
Go back to Setting Up Your Hosts
Checking the CUDA Version
After completing all of the steps above, you must check the CUDA version.
To check the CUDA version:
Check the CUDA version:
$ nvidia-smi
The following is an example of the correct output:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-PCI... On | 00000000:17:00.0 Off | 0 | | N/A 34C P0 64W / 300W | 79927MiB / 80994MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA A100-PCI... On | 00000000:CA:00.0 Off | 0 | | N/A 35C P0 60W / 300W | 79927MiB / 80994MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
In the above output, the CUDA version is 10.1.
If the above output is not generated, CUDA has not been installed. To install CUDA, see installing-the-cuda-driver.
Go back to Setting Up Your Hosts
Installing Your Kubernetes Cluster
After setting up your hosts, you must install your Kubernetes cluster. The Kubernetes and SQream software must be installed from the management host, and can be installed on any server in the cluster.
Installing your Kubernetes cluster requires the following:
Generating and Sharing SSH Keypairs Across All Existing Nodes
You can generate and share SSH keypairs across all existing nodes. Sharing SSH keypairs across all nodes enables passwordless access from the management server to all nodes in the cluster. All nodes in the cluster require passwordless access.
Note
You must generate and share an SSH keypair across all nodes even if you are installing the Kubernetes cluster on a single host.
To generate and share an SSH keypair:
Switch to root user access:
$ sudo su -
Generate an RSA key pair:
$ ssh-keygen
The following is an example of the correct output:
$ ssh-keygen $ Generating public/private rsa key pair. $ Enter file in which to save the key (/root/.ssh/id_rsa): $ Created directory '/root/.ssh'. $ Enter passphrase (empty for no passphrase): $ Enter same passphrase again: $ Your identification has been saved in /root/.ssh/id_rsa. $ Your public key has been saved in /root/.ssh/id_rsa.pub. $ The key fingerprint is: $ SHA256:xxxxxxxxxxxxxxdsdsdffggtt66gfgfg root@localhost.localdomain $ The key's randomart image is: $ +---[RSA 2048]----+ $ | =*. | $ | .o | $ | ..o o| $ | . .oo +.| $ | = S =...o o| $ | B + *..o+.| $ | o * *..o .+| $ | o * oo.E.o| $ | . ..+..B.+o| $ +----[SHA256]-----+
The generated file is /root/.ssh/id_rsa.pub
.
Copy the public key to all servers in the cluster, including the one that you are running on.
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@remote-host
Replace the
remote host
with your host IP address.
Go back to Installing Your Kubernetes Cluster
Installing and Deploying a Kubernetes Cluster Using Kubespray
SQream uses the Kubespray software package to install and deploy Kubernetes clusters.
To install and deploy a Kubernetes cluster using Kubespray:
Clone Kubernetes:
Clone the kubespray.git repository:
$ git clone https://github.com/kubernetes-incubator/kubespray.git
Nagivate to the kubespray directory:
$ cd kubespray
Install the requirements.txt configuration file:
$ pip3 install -r requirements.txt
Create your SQream inventory directory:
Run the following command:
$ cp -rp inventory/sample inventory/sqream
Replace the <cluster node IP> with the defined cluster node IP address(es).
$ declare -a IPS=(<host>, <cluster node IP address>)
For example, the following replaces
192.168.0.93
with192.168.0.92
:$ declare -a IPS=(host-93,192.168.0.93 host-92,192.168.0.92)
- Note the following:
Running a declare requires defining a pair (host name and cluster node IP address), as shown in the above example.
You can define more than one pair.
When the reboot is complete, switch back to the root user:
$ sudo su -
Navigate to root/kubespray:
$ cd /root/kubespray
Copy
inventory/sample
asinventory/sqream
:$ cp -rfp inventory/sample inventory/sqream
Update the Ansible inventory file with the inventory builder:
$ declare -a IPS=(<hostname1>,<IP1> <hostname2>,<IP2> <hostname3>,<IP3>)
In the kubespray hosts.yml file, set the node IP’s:
$ CONFIG_FILE=inventory/sqream/hosts.yml python3 contrib/inventory_builder/inventory.py ${IPS[@]}
If you do not set a specific hostname in declare, the server hostnames will change to
node1
,node2
, etc. To maintain specific hostnames, run declare as in the following example:$ declare -a IPS=(eks-rhl-1,192.168.5.81 eks-rhl-2,192.168.5.82 eks-rhl-3,192.168.5.83)
Note that the declare must contain pairs (hostname,ip).
Verify that the following have been done:
That the hosts.yml file is configured correctly.
That all children are included with their relevant nodes.
You can save your current server hostname by replacing <nodeX> with your server hostname.
Generate the content output of the hosts.yml file. Make sure to include the file’s directory:
$ cat inventory/sqream/hosts.yml
The hostname can be lowercase and contain -
or .
only, and must be aligned with the server’s hostname.
The following is an example of the correct output. Each host and IP address that you provided in Step 2 should be displayed once:
$ all: $ hosts: $ node1: $ ansible_host: 192.168.5.81 $ ip: 192.168.5.81 $ access_ip: 192.168.5.81 $ node2: $ ansible_host: 192.168.5.82 $ ip: 192.168.5.82 $ access_ip: 192.168.5.82 $ node3: $ ansible_host: 192.168.5.83 $ ip: 192.168.5.83 $ access_ip: 192.168.5.83 $ children: $ kube-master: $ hosts: $ node1: $ node2: $ node3: $ kube-node: $ hosts: $ node1: $ node2: $ node3: $ etcd: $ hosts: $ node1: $ node2: $ node3: $ k8s-cluster: $ children: $ kube-master: $ kube-node: $ calico-rr: $ hosts: {}
Go back to Installing Your Kubernetes Cluster
Adjusting Kubespray Deployment Values
After downloading and configuring Kubespray, you can adjust your Kubespray deployment values. A script is used to modify how the Kubernetes cluster is deployed, and you must set the cluster name variable before running this script.
Note
The script must be run from the kubespray folder.
To adjust Kubespray deployment values:
Add the following export to the local user’s ~/.bashrc file by replacing the <VIP IP> with the user’s Virtual IP address:
$ export VIP_IP=<VIP IP>
Logout, log back in, and verify the following:
$ echo $VIP_IP
Make the following replacements to the kubespray.settings.sh file:
$ cat <<EOF > kubespray_settings.sh $ sed -i "/cluster_name: cluster.local/c \cluster_name: cluster.local.$cluster_name" inventory/sqream/group_vars/k8s-cluster/k8s-cluster.yml $ sed -i "/dashboard_enabled/c \dashboard_enabled\: "false"" inventory/sqream/group_vars/k8s-cluster/addons.yml $ sed -i "/kube_version/c \kube_version\: "v1.18.3"" inventory/sqream/group_vars/k8s-cluster/k8s-cluster.yml $ sed -i "/metrics_server_enabled/c \metrics_server_enabled\: "true"" inventory/sample/group_vars/k8s-cluster/addons.yml $ echo 'kube_apiserver_node_port_range: "3000-6000"' >> inventory/sqream/group_vars/k8s-cluster/k8s-cluster.yml $ echo 'kube_controller_node_monitor_grace_period: 20s' >> inventory/sqream/group_vars/k8s-cluster/k8s-cluster.yml $ echo 'kube_controller_node_monitor_period: 2s' >> inventory/sqream/group_vars/k8s-cluster/k8s-cluster.yml $ echo 'kube_controller_pod_eviction_timeout: 30s' >> inventory/sqream/group_vars/k8s-cluster/k8s-cluster.yml $ echo 'kubelet_status_update_frequency: 4s' >> inventory/sqream/group_vars/k8s-cluster/k8s-cluster.yml $ echo 'ansible ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers $ EOF
Note
In most cases, the Docker data resides on the system disk. Because Docker requires a high volume of data (images, containers, volumes, etc.), you can change the default Docker data location to prevent the system disk from running out of space.
Optional - Change the default Docker data location:
$ sed -i "/docker_daemon_graph/c \docker_daemon_graph\: "</path/to/desired/location>"" inventory/sqream/group_vars/all/docker.yml
Make the kubespray_settings.sh file executable for your user:
$ chmod u+x kubespray_settings.sh && ./kubespray_settings.sh
Run the following script:
$ ./kubespray_settings.sh
Run a playbook on the inventory/sqream/hosts.yml cluster.yml file:
$ ansible-playbook -i inventory/sqream/hosts.yml cluster.yml -v
The Kubespray installation takes approximately 10 - 15 minutes.
The following is an example of the correct output:
$ PLAY RECAP $ ********************************************************************************************* $ node-1 : ok=680 changed=133 unreachable=0 failed=0 $ node-2 : ok=583 changed=113 unreachable=0 failed=0 $ node-3 : ok=586 changed=115 unreachable=0 failed=0 $ localhost : ok=1 changed=0 unreachable=0 failed=0
In the event that the output is incorrect, or a failure occurred during the installation, please contact a SQream customer support representative.
Go back to Installing Your Kubernetes Cluster.
Checking Your Kubernetes Status
After adjusting your Kubespray deployment values, you must check your Kubernetes status.
To check your Kuberetes status:
Check the status of the node:
$ kubectl get nodes
The following is an example of the correct output:
$ NAME STATUS ROLES AGE VERSION $ eks-rhl-1 Ready control-plane,master 29m v1.21.1 $ eks-rhl-2 Ready control-plane,master 29m v1.21.1 $ eks-rhl-3 Ready <none> 28m v1.21.1
Check the status of the pod:
$ kubectl get pods --all-namespaces
The following is an example of the correct output:
$ NAMESPACE NAME READY STATUS RESTARTS AGE $ kube-system calico-kube-controllers-68dc8bf4d5-n9pbp 1/1 Running 0 160m $ kube-system calico-node-26cn9 1/1 Running 1 160m $ kube-system calico-node-kjsgw 1/1 Running 1 160m $ kube-system calico-node-vqvc5 1/1 Running 1 160m $ kube-system coredns-58687784f9-54xsp 1/1 Running 0 160m $ kube-system coredns-58687784f9-g94xb 1/1 Running 0 159m $ kube-system dns-autoscaler-79599df498-hlw8k 1/1 Running 0 159m $ kube-system kube-apiserver-k8s-host-1-134 1/1 Running 0 162m $ kube-system kube-apiserver-k8s-host-194 1/1 Running 0 161m $ kube-system kube-apiserver-k8s-host-68 1/1 Running 0 161m $ kube-system kube-controller-manager-k8s-host-1-134 1/1 Running 0 162m $ kube-system kube-controller-manager-k8s-host-194 1/1 Running 0 161m $ kube-system kube-controller-manager-k8s-host-68 1/1 Running 0 161m $ kube-system kube-proxy-5f42q 1/1 Running 0 161m $ kube-system kube-proxy-bbwvk 1/1 Running 0 161m $ kube-system kube-proxy-fgcfb 1/1 Running 0 161m $ kube-system kube-scheduler-k8s-host-1-134 1/1 Running 0 161m $ kube-system kube-scheduler-k8s-host-194 1/1 Running 0 161m
Go back to Installing Your Kubernetes Cluster
Adding a SQream Label to Your Kubernetes Cluster Nodes
After checking your Kubernetes status, you must add a SQream label on your Kubernetes cluster nodes.
To add a SQream label on your Kubernetes cluster nodes:
Get the cluster node list:
$ kubectl get nodes
The following is an example of the correct output:
$ NAME STATUS ROLES AGE VERSION $ eks-rhl-1 Ready control-plane,master 29m v1.21.1 $ eks-rhl-2 Ready control-plane,master 29m v1.21.1 $ eks-rhl-3 Ready <none> 28m v1.21.1
Set the node label, change the
node-name
to the node NAME(s) in the above example:$ kubectl label nodes <node-name> cluster=sqream
The following is an example of the correct output:
$ [root@edk-rhl-1 kubespray]# kubectl label nodes eks-rhl-1 cluster=sqream $ node/eks-rhl-1 labeled $ [root@edk-rhl-1 kubespray]# kubectl label nodes eks-rhl-2 cluster=sqream $ node/eks-rhl-2 labeled $ [root@edk-rhl-1 kubespray]# kubectl label nodes eks-rhl-3 cluster=sqream $ node/eks-rhl-3 labeled
Go back to Installing Your Kubernetes Cluster
Copying Your Kubernetes Configuration API File to the Master Cluster Nodes
After adding a SQream label on your Kubernetes cluster nodes, you must copy your Kubernetes configuration API file to your Master cluster nodes.
When the Kubernetes cluster installation is complete, an API configuration file is automatically created in the .kube folder of the root user. This file enables the kubectl command access Kubernetes’ internal API service. Following this step lets you run kubectl commands from any node in the cluster.
Warning
You must perform this on the management server only!
To copy your Kubernetes configuration API file to your Master cluster nodes:
Create the .kube folder in the local user directory:
$ mkdir /home/<local user>/.kube
Copy the configuration file from the root user directory to the <local user> directory:
$ sudo cp /root/.kube/config /home/<local user>/.kube
Change the file owner from root user to the <local user>:
$ sudo chown <local user>.<local user> /home/<local user>/.kube/config
Create the .kube folder in the other nodes located in the <local user> directory:
$ ssh <local user>@<node name> mkdir .kube
Copy the configuration file from the management node to the other nodes:
$ scp /home/<local user>/.kube/config <local user>@<node name>:/home/<local user>/.kube/
Under local user on each server you copied .kube to, run the following command:
$ sudo usermod -aG docker $USER
This grants the local user the necessary permissions to run Docker commands.
Go back to Installing Your Kubernetes Cluster
Creating an env_file in Your Home Directory
After copying your Kubernetes configuration API file to your Master cluster nodes, you must create an env_file in your home directory, and must set the VIP address as a variable.
Warning
You must perform this on the management server only!
To create an env_file for local users in the user’s home directory:
Set a variable that includes the VIP IP address:
$ export VIP_IP=<VIP IP>
Note
If you use Kerberos, replace the KRB5_SERVER
value with the IP address of your Kerberos server.
Do one of the following:
For local users:
$ mkdir /home/$USER/.sqream
Make the following replacements to the kubespray.settings.sh file, verifying that the
KRB5_SERVER
parameter is set to your server IP:$ cat <<EOF > /home/$USER/.sqream/env_file SQREAM_K8S_VIP=$VIP_IP SQREAM_ADMIN_UI_PORT=8080 SQREAM_DASHBOARD_DATA_COLLECTOR_PORT=8100 SQREAM_DATABASE_NAME=master SQREAM_K8S_ADMIN_UI=sqream-admin-ui SQREAM_K8S_DASHBOARD_DATA_COLLECTOR=dashboard-data-collector SQREAM_K8S_METADATA=sqream-metadata SQREAM_K8S_NAMESPACE=sqream SQREAM_K8S_PICKER=sqream-picker SQREAM_K8S_PROMETHEUS=prometheus SQREAM_K8S_REGISTRY_PORT=6000 SQREAM_METADATA_PORT=3105 SQREAM_PICKER_PORT=3108 SQREAM_PROMETHEUS_PORT=9090 SQREAM_SPOOL_MEMORY_RATIO=0.25 SQREAM_WORKER_0_PORT=5000 KRB5CCNAME=FILE:/tmp/tgt KRB5_SERVER=kdc.sq.com:<server IP>1 KRB5_CONFIG_DIR=${ $ SQREAM_MOUNT_DIR}/krb5 KRB5_CONFIG_FILE=${KRB5_CONFIG_DIR}/krb5.conf HADOOP_CONFIG_DIR=${ $ SQREAM_MOUNT_DIR}/hadoop HADOOP_CORE_XML=${HADOOP_CONFIG_DIR}/core-site.xml HADOOP_HDFS_XML=${HADOOP_CONFIG_DIR}/hdfs-site.xml EOF
Go back to Installing Your Kubernetes Cluster
Creating a Base Kubernetes Namespace
After creating an env_file in the user’s home directory, you must create a base Kubernetes namespace.
You can create a Kubernetes namespace by running the following command:
$ kubectl create namespace sqream-init
The following is an example of the correct output:
$ namespace/sqream-init created
Go back to Installing Your Kubernetes Cluster
Pushing the env_file File to the Kubernetes Configmap
After creating a base Kubernetes namespace, you must push the env_file to the Kubernetes configmap. You must push the env_file file to the Kubernetes configmap in the sqream-init namespace.
This is done by running the following command:
$ kubectl create configmap sqream-init -n sqream-init --from-env-file=/home/$USER/.sqream/env_file
The following is an example of the correct output:
$ configmap/sqream-init created
Go back to Installing Your Kubernetes Cluster
Installing the NVIDIA Docker2 Toolkit
After pushing the env_file file to the Kubernetes configmap, you must install the NVIDIA Docker2 Toolkit. The NVIDIA Docker2 Toolkit lets users build and run GPU-accelerated Docker containers, and must be run only on GPU servers. The NVIDIA Docker2 Toolkit includes a container runtime library and utilities that automatically configure containers to leverage NVIDIA GPUs.
Installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on CentOS
To install the NVIDIA Docker2 Toolkit on an x86_64 bit processor on CentOS:
Add the repository for your distribution:
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \ $ sudo tee /etc/yum.repos.d/nvidia-docker.repo
Install the nvidia-docker2 package and reload the Docker daemon configuration:
$ sudo yum install nvidia-docker2 $ sudo pkill -SIGHUP dockerd
Verify that the nvidia-docker2 package has been installed correctly:
$ docker run --runtime=nvidia --rm nvidia/cuda:10.1.3-base-centos7 nvidia-smi
The following is an example of the correct output:
docker run --runtime=nvidia --rm nvidia/cuda:10.1.3-base-centos7 nvidia-smi Unable to find image 'nvidia/cuda:10.1.3-base-centos7' locally 10.1.3-base-centos7: Pulling from nvidia/cuda d519e2592276: Pull complete d22d2dfcfa9c: Pull complete b3afe92c540b: Pull complete 13a10df09dc1: Pull complete 4f0bc36a7e1d: Pull complete cd710321007d: Pull complete Digest: sha256:635629544b2a2be3781246fdddc55cc1a7d8b352e2ef205ba6122b8404a52123 Status: Downloaded newer image for nvidia/cuda:10.1.3-base-centos7 Sun Feb 14 13:27:58 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-PCI... On | 00000000:17:00.0 Off | 0 | | N/A 34C P0 64W / 300W | 79927MiB / 80994MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA A100-PCI... On | 00000000:CA:00.0 Off | 0 | | N/A 35C P0 60W / 300W | 79927MiB / 80994MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
For more information on installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on CentOS, see NVIDIA Docker Installation - CentOS distributions
Installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on Ubuntu
To install the NVIDIA Docker2 Toolkit on an x86_64 bit processor on Ubuntu:
Add the repository for your distribution:
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \ $ sudo apt-key add - $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ $ sudo tee /etc/apt/sources.list.d/nvidia-docker.list $ sudo apt-get update
Install the nvidia-docker2 package and reload the Docker daemon configuration:
$ sudo apt-get install nvidia-docker2 $ sudo pkill -SIGHUP dockerd
Verify that the nvidia-docker2 package has been installed correctly:
$ docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
For more information on installing the NVIDIA Docker2 Toolkit on an x86_64 Bit Processor on Ubuntu, see NVIDIA Docker Installation - Ubuntu distributions
Go back to Installing Your Kubernetes Cluster
Modifying the Docker Daemon JSON File for GPU and Compute Nodes
After installing the NVIDIA Docker2 toolkit, you must modify the Docker daemon JSON file for GPU and Compute nodes.
Modifying the Docker Daemon JSON File for GPU Nodes
To modify the Docker daemon JSON file for GPU nodes:
Enable GPU and set HTTP access to the local Kubernetes Docker registry.
Note
The Docker daemon JSON file must be modified on all GPU nodes.
Note
Contact your IT department for a virtual IP.
Replace the
VIP address
with your assigned VIP address.
Connect as a root user:
$ sudo -i
Set a variable that includes the VIP address:
$ export VIP_IP=<VIP IP>
Replace the <VIP IP> with the VIP address:
$ cat <<EOF > /etc/docker/daemon.json $ { $ "insecure-registries": ["$VIP_IP:6000"], $ "default-runtime": "nvidia", $ "runtimes": { $ "nvidia": { $ "path": "nvidia-container-runtime", $ "runtimeArgs": [] $ } $ } $ } $ EOF
Apply the changes and restart Docker:
$ systemctl daemon-reload && systemctl restart docker
Exit the root user:
$ exit
Go back to Installing Your Kubernetes Cluster
Modifying the Docker Daemon JSON File for Compute Nodes
You must follow this procedure only if you have a Compute node.
To modify the Docker daemon JSON file for Compute nodes:
Switch to a root user:
$ sudo -i
Set a variable that includes a VIP address.
Note
Contact your IT department for a virtual IP.
Replace the
VIP address
with your assigned VIP address.$ cat <<EOF > /etc/docker/daemon.json $ { $ "insecure-registries": ["$VIP_IP:6000"] $ } $ EOF
Restart the services:
$ systemctl daemon-reload && systemctl restart docker
Exit the root user:
$ exit
Go back to Installing Your Kubernetes Cluster
Installing the Nvidia-device-plugin Daemonset
After modifying the Docker daemon JSON file for GPU or Compute Nodes, you must installing the Nvidia-device-plugin daemonset. The Nvidia-device-plugin daemonset is only relevant to GPU nodes.
To install the Nvidia-device-plugin daemonset:
Set
nvidia.com/gpu
totrue
on all GPU nodes:
$ kubectl label nodes <GPU node name> nvidia.com/gpu=true
Replace the <GPU node name> with your GPU node name:
For a complete list of GPU node names, run the
kubectl get nodes
command.The following is an example of the correct output:
$ [root@eks-rhl-1 ~]# kubectl label nodes eks-rhl-1 nvidia.com/gpu=true $ node/eks-rhl-1 labeled $ [root@eks-rhl-1 ~]# kubectl label nodes eks-rhl-2 nvidia.com/gpu=true $ node/eks-rhl-2 labeled $ [root@eks-rhl-1 ~]# kubectl label nodes eks-rhl-3 nvidia.com/gpu=true $ node/eks-rhl-3 labeled
Go back to Installing Your Kubernetes Cluster
Creating an Nvidia Device Plugin
After installing the Nvidia-device-plugin daemonset, you must create an Nvidia-device-plugin. You can create an Nvidia-device-plugin by running the following command
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta6/nvidia-device-plugin.yml
If needed, you can check the status of the Nvidia-device-plugin-daemonset pod status:
$ kubectl get pods -n kube-system -o wide | grep nvidia-device-plugin
The following is an example of the correct output:
$ NAME READY STATUS RESTARTS AGE
$ nvidia-device-plugin-daemonset-fxfct 1/1 Running 0 6h1m
$ nvidia-device-plugin-daemonset-jdvxs 1/1 Running 0 6h1m
$ nvidia-device-plugin-daemonset-xpmsv 1/1 Running 0 6h1m
Go back to Installing Your Kubernetes Cluster
Checking GPU Resources Allocatable to GPU Nodes
After creating an Nvidia Device Plugin, you must check the GPU resources alloctable to the GPU nodes. Each GPU node has records, such as nvidia.com/gpu: <#>
. The #
indicates the number of allocatable, or available, GPUs in each node.
You can output a description of allocatable resources by running the following command:
$ kubectl describe node | grep -i -A 7 -B 2 allocatable:
The following is an example of the correct output:
$ Allocatable:
$ cpu: 3800m
$ ephemeral-storage: 94999346224
$ hugepages-1Gi: 0
$ hugepages-2Mi: 0
$ memory: 15605496Ki
$ nvidia.com/gpu: 1
$ pods: 110
Go back to Installing Your Kubernetes Cluster
Preparing the WatchDog Monitor
SQream’s deployment includes installing two watchdog services. These services monitor Kuberenetes management and the server’s storage network.
You can enable the storage watchdogs by adding entries in the /etc/hosts file on each server:
$ <address 1> k8s-node1.storage
$ <address 2> k8s-node2.storage
$ <address 3> k8s-node3.storage
The following is an example of the correct syntax:
$ 10.0.0.1 k8s-node1.storage
$ 10.0.0.2 k8s-node2.storage
$ 10.0.0.3 k8s-node3.storage
Go back to Installing Your Kubernetes Cluster
Installing the SQream Software
Once you’ve prepared the SQream environment for launching it using Kubernetes, you can begin installing the SQream software.
The Installing the SQream Software section describes the following:
Getting the SQream Package
The first step in installing the SQream software is getting the SQream package. Please contact the SQream Support team to get the sqream_k8s-nnn-DBnnn-COnnn-SDnnn-<arch>.tar.gz tarball file.
This file includes the following values:
sqream_k8s-<nnn> - the SQream installer version.
DB<nnn> - the SQreamDB version.
CO<nnn> - the SQream console version.
SD<nnn> - the SQream Acceleration Studio version.
arch - the server architecture.
You can extract the contents of the tarball by running the following command:
$ tar -xvf sqream_k8s-1.0.15-DB2020.1.0.2-SD0.7.3-x86_64.tar.gz
$ cd sqream_k8s-1.0.15-DB2020.1.0.2-SD0.7.3-x86_64
$ ls
Extracting the contents of the tarball file generates a new folder with the same name as the tarball file.
The following shows the output of the extracted file:
drwxrwxr-x. 2 sqream sqream 22 Jan 27 11:39 license
lrwxrwxrwx. 1 sqream sqream 49 Jan 27 11:39 sqream -> .sqream/sqream-sql-v2020.3.1_stable.x86_64/sqream
-rwxrwxr-x. 1 sqream sqream 9465 Jan 27 11:39 sqream-install
-rwxrwxr-x. 1 sqream sqream 12444 Jan 27 11:39 sqream-start
Go back to Installing Your SQream Software
Setting Up and Configuring Hadoop
After getting the SQream package, you can set up and configure Hadoop by configuring the keytab and krb5.conf files.
Note
You only need to configure the keytab and krb5.conf files if you use Hadoop with Kerberos authentication.
To set up and configure Hadoop:
Contact IT for the keytab and krb5.conf files.
Copy both files into the respective empty .hadoop/ and .krb5/ directories:
$ cp hdfs.keytab krb5.conf .krb5/
$ cp core-site.xml hdfs-site.xml .hadoop/
The SQream installer automatically copies the above files during the installation process.
Go back to Installing Your SQream Software
Starting a Local Docker Image Registry
After getting the SQream package, or (optionally) setting up and configuring Hadoop, you must start a local Docker image registry. Because Kubernetes is based on Docker, you must start the local Docker image registry on the host’s shared folder. This allows all hosts to pull the SQream Docker images.
To start a local Docker image registry:
Create a Docker registry folder:
$ mkdir <shared path>/docker-registry/
Set the
docker_path
for the Docker registry folder:$ export docker_path=<path>
Apply the docker-registry service to the cluster:
$ cat .k8s/admin/docker_registry.yaml | envsubst | kubectl create -f -
The following is an example of the correct output:
namespace/sqream-docker-registry created configmap/sqream-docker-registry-config created deployment.apps/sqream-docker-registry created service/sqream-docker-registry created
Check the pod status of the docker-registry service:
$ kubectl get pods -n sqream-docker-registry
The following is an example of the correct output:
NAME READY STATUS RESTARTS AGE sqream-docker-registry-655889fc57-hmg7h 1/1 Running 0 6h40m
Go back to Installing Your SQream Software
Installing the Kubernetes Dashboard
After starting a local Docker image registry, you must install the Kubernetes dashboard. The Kubernetes dashboard lets you see the Kubernetes cluster, nodes, services, and pod status.
To install the Kubernetes dashboard:
Apply the k8s-dashboard service to the cluster:
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/aio/deploy/recommended.yaml
The following is an example of the correct output:
namespace/kubernetes-dashboard created serviceaccount/kubernetes-dashboard created service/kubernetes-dashboard created secret/kubernetes-dashboard-certs created secret/kubernetes-dashboard-csrf created secret/kubernetes-dashboard-key-holder created configmap/kubernetes-dashboard-settings created role.rbac.authorization.k8s.io/kubernetes-dashboard created clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created deployment.apps/kubernetes-dashboard created service/dashboard-metrics-scraper created deployment.apps/dashboard-metrics-scraper created
Grant the user external access to the Kubernetes dashboard:
$ cat .k8s/admin/kubernetes-dashboard-svc-metallb.yaml | envsubst | kubectl create -f -
The following is an example of the correct output:
service/kubernetes-dashboard-nodeport created
Create the
cluster-admin-sa.yaml
file:$ kubectl create -f .k8s/admin/cluster-admin-sa.yaml
The following is an example of the correct output:
clusterrolebinding.rbac.authorization.k8s.io/cluster-admin-sa-cluster-admin created
Check the pod status of the K8s-dashboard service:
$ kubectl get pods -n kubernetes-dashboard
The following is an example of the correct output:
NAME READY STATUS RESTARTS AGE dashboard-metrics-scraper-6b4884c9d5-n8p57 1/1 Running 0 4m32s kubernetes-dashboard-7b544877d5-qc8b4 1/1 Running 0 4m32s
Obtain the k8s-dashboard access token:
$ kubectl -n kube-system describe secrets cluster-admin-sa-token
The following is an example of the correct output:
Name: cluster-admin-sa-token-rbl9p Namespace: kube-system Labels: <none> Annotations: kubernetes.io/service-account.name: cluster-admin-sa kubernetes.io/service-account.uid: 81866d6d-8ef3-4805-840d-58618235f68d Type: kubernetes.io/service-account-token Data ==== ca.crt: 1025 bytes namespace: 11 bytes token: eyJhbGciOiJSUzI1NiIsImtpZCI6IjRMV09qVzFabjhId09oamQzZGFFNmZBeEFzOHp3SlJOZWdtVm5lVTdtSW8ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJjbHVzdGVyLWFkbWluLXNhLXRva2VuLXJibDlwIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImNsdXN0ZXItYWRtaW4tc2EiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI4MTg2NmQ2ZC04ZWYzLTQ4MDUtODQwZC01ODYxODIzNWY2OGQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06Y2x1c3Rlci1hZG1pbi1zYSJ9.mNhp8JMr5y3hQ44QrvRDCMueyjSHSrmqZcoV00ZC7iBzNUqh3n-fB99CvC_GR15ys43jnfsz0tdsTy7VtSc9hm5ENBI-tQ_mwT1Zc7zJrEtgFiA0o_eyfYZOARdhdyFEJg84bzkIxJFPKkBWb4iPWU1Xb7RibuMCjNTarZMZbqzKYfQEcMZWJ5UmfUqp-HahZZR4BNbjSWybs7t6RWdcQZt6sO_rRCDrOeEJlqKKjx4-5jFZB8Du_0kKmnw2YJmmSCEOXrpQCyXIiZJpX08HyDDYfFp8IGzm61arB8HDA9dN_xoWvuz4Cj8klUtTzL9effJJPjHJlZXcEqQc9hE3jw
Navigate to
https://<VIP address>:5999
.
Select the Token radio button, paste the token from the previous command output, and click Sign in.
The Kubernetes dashboard is displayed.
Go back to Installing Your SQream Software
Installing the SQream Prometheus Package
After installing the Kubernetes dashboard, you must install the SQream Prometheus package. To properly monitor the host and GPU statistics the exporter service must be installed on each Kubernetes cluster node.
This section describes how to install the following:
node_exporter - collects host data, such as CPU memory usage.
nvidia_exporter - collects GPU utilization data.
Note
The steps in this section must be done on all cluster nodes.
To install the sqream-prometheus package, you must do the following:
Go back to Installing Your SQream Software
Installing the Exporter Service
To install the exporter service:
Create a user and group that will be used to run the exporter services:
$ sudo groupadd --system prometheus && sudo useradd -s /sbin/nologin --system -g prometheus prometheus
Extract the sqream_exporters_prometheus.0.1.tar.gz file:
$ cd .prometheus $ tar -xf sqream_exporters_prometheus.0.1.tar.gz
Copy the exporter software files to the /usr/bin directory:
$ cd sqream_exporters_prometheus.0.1 $ sudo cp node_exporter/node_exporter /usr/bin/ $ sudo cp nvidia_exporter/nvidia_exporter /usr/bin/
Copy the exporters service file to the /etc/systemd/system/ directory:
$ sudo cp services/node_exporter.service /etc/systemd/system/ $ sudo cp services/nvidia_exporter.service /etc/systemd/system/
Set the permission and group of the service files:
$ sudo chown prometheus:prometheus /usr/bin/node_exporter $ sudo chmod u+x /usr/bin/node_exporter $ sudo chown prometheus:prometheus /usr/bin/nvidia_exporter $ sudo chmod u+x /usr/bin/nvidia_exporter
Reload the services:
$ sudo systemctl daemon-reload
Start both services and set them to start when the server is booted up:
Node_exporter:
$ sudo systemctl start node_exporter && sudo systemctl enable node_exporter
Nvidia_exporter:
$ sudo systemctl start nvidia_exporter && sudo systemctl enable nvidia_exporter
Checking the Exporter Status
After installing the exporter service, you must check its status.
You can check the exporter status by running the following command:
$ sudo systemctl status node_exporter && sudo systemctl status nvidia_exporter
Go back to Installing Your SQream Software
Running the Sqream-install Service
The Running the Sqream-install Service section describes the following:
Installing Your License
After install the SQream Prometheus package, you must install your license.
To install your license:
Copy your license package to the sqream /license folder.
Note
You do not need to untar the license package after copying it to the /license folder because the installer script does it automatically.
The following flags are mandatory during your first run:
$ sudo ./sqream-install -i -k -m <path to sqream cluster>
Note
If you cannot run the script with sudo, verify that you have the right permission (rwx for the user) on the relevant directories (config, log, volume, and data-in directories).
Go back to Running the SQream_install Service.
Changing Your Data Ingest Folder
After installing your license, you must change your data ingest folder.
You can change your data ingest folder by running the following command:
$ sudo ./sqream-install -d /media/nfs/sqream/data_in
Go back to Running the SQream_install Service.
Checking Your System Settings
After changing your data ingest folder, you must check your system settings.
The following command shows you all the variables that your SQream system is running with:
$ ./sqream-install -s
After optionally checking your system settings, you can use the sqream-start application to control your Kubernetes cluster.
Go back to Running the SQream_install Service.
SQream Installation Command Reference
If needed, you can use the sqream-install flag reference for any needed flags by typing:
$ ./sqream-install --help
The following shows the sqream–install flag descriptions:
Flag |
Function |
Note |
---|---|---|
-i |
Loads all the software from the hidden .docker folder. |
Mandatory |
-k |
Loads the license package from the /license directory. |
Mandatory |
-m |
Sets the relative path for all SQream folders under the shared filesystem available from all nodes (sqreamdb, config, logs and data_in). No other flags are required if you use this flag (such as c, v, l or d). |
Mandatory |
-c |
Sets the path where to write/read SQream configuration files from. The default is /etc/sqream/. |
Optional |
-v |
Shows the location of the SQream cluster. |
Optional |
-l |
Shows the location of the SQream system startup logs. The logs contain startup and Docker logs. The default is /var/log/sqream/. |
Optional |
-d |
Shows the folder containing data that you want to import into or copy from SQream. |
Optional |
-n <Namespace> |
Sets the Kubernetes namespace. The default is sqream. |
Optional |
-N <Namespace> |
Deletes a specific Kubernetes namespace and sets the factory default namespace (sqream). |
Optional |
-f |
Overwrite existing folders and all files located in mounted directories. |
Optional |
-r |
Resets the system configuration. This flag is run without any other variables. |
Optional |
-s |
Shows the system settings. |
Optional |
-e |
Sets the Kubernetes cluster’s virtual IP address. |
Optional |
-h |
Help, shows all available flags. |
Optional |
Go back to Running the SQream_install Service.
Controlling Your Kubernetes Cluster Using SQream Flags
You can control your Kubernetes cluster using SQream flags.
The following command shows you the available Kubernetes cluster control options:
$ ./sqream-start -h
The following describes the sqream-start flags:
Flag |
Function |
Note |
---|---|---|
-s |
Starts the sqream services, starting metadata, server picker, and workers. The number of workers started is based on the number of available GPU’s. |
Mandatory |
-p |
Sets specific ports to the workers services. You must enter the starting port for the sqream-start application to allocate it based on the number of workers. |
|
-j |
Uses an external .json configuration file. The file must be located in the configuration directory. |
The workers must each be started individually. |
-m |
Allocates worker spool memory. |
The workers must each be started individually. |
-a |
Starts the SQream Administration dashboard and specifies the listening port. |
|
-d |
Deletes all running SQream services. |
|
-h |
Shows all available flags. |
Help |
Go back to Running the SQream_install Service.
Using the sqream-start Commands
In addition to controlling your Kubernetes cluster using SQream flags, you can control it using sqream-start commands.
The Using the sqream-start Commands section describes the following:
Starting Your SQream Services
You can run the sqream-start command with the -s flag to start SQream services on all available GPU’s:
$ sudo ./sqream-start -s
This command starts the SQream metadata, server picker, and sqream workers on all available GPU’s in the cluster.
The following is an example of the correct output:
./sqream-start -s
Initializing network watchdogs on 3 hosts...
Network watchdogs are up and running
Initializing 3 worker data collectors ...
Worker data collectors are up and running
Starting Prometheus ...
Prometheus is available at 192.168.5.100:9090
Starting SQream master ...
SQream master is up and running
Starting up 3 SQream workers ...
All SQream workers are up and running, SQream-DB is available at 192.168.5.100:3108
All SQream workers are up and running, SQream-DB is available at 192.168.5.100:3108
Go back to Using the SQream-start Commands.
Starting Your SQream Services in Split Mode
Starting SQream services in split mode refers to running multiple SQream workers on a single GPU. You can do this by running the sqream-start command with the -s and -z flags. In addition, you can define the amount of hosts to run the multiple workers on. In the example below, the command defines to run the multiple workers on three hosts.
To start SQream services in split mode:
Run the following command:
$ ./sqream-start -s -z 3
This command starts the SQream metadata, server picker, and sqream workers on a single GPU for three hosts:
The following is an example of the correct output:
Initializing network watchdogs on 3 hosts...
Network watchdogs are up and running
Initializing 3 worker data collectors ...
Worker data collectors are up and running
Starting Prometheus ...
Prometheus is available at 192.168.5.101:9090
Starting SQream master ...
SQream master is up and running
Starting up 9 SQream workers over <#> available GPUs ...
All SQream workers are up and running, SQream-DB is available at 192.168.5.101:3108
Verify all pods are properly running in k8s cluster (STATUS column):
kubectl -n sqream get pods
NAME READY STATUS RESTARTS AGE
prometheus-bcf877867-kxhld 1/1 Running 0 106s
sqream-metadata-fbcbc989f-6zlkx 1/1 Running 0 103s
sqream-picker-64b8c57ff5-ndfr9 1/1 Running 2 102s
sqream-split-workers-0-1-2-6bdbfbbb86-ml7kn 1/1 Running 0 57s
sqream-split-workers-3-4-5-5cb49d49d7-596n4 1/1 Running 0 57s
sqream-split-workers-6-7-8-6d598f4b68-2n9z5 1/1 Running 0 56s
sqream-workers-start-xj75g 1/1 Running 0 58s
watchdog-network-management-6dnfh 1/1 Running 0 115s
watchdog-network-management-tfd46 1/1 Running 0 115s
watchdog-network-management-xct4d 1/1 Running 0 115s
watchdog-network-storage-lr6v4 1/1 Running 0 116s
watchdog-network-storage-s29h7 1/1 Running 0 116s
watchdog-network-storage-sx9mw 1/1 Running 0 116s
worker-data-collector-62rxs 0/1 Init:0/1 0 54s
worker-data-collector-n8jsv 0/1 Init:0/1 0 55s
worker-data-collector-zp8vf 0/1 Init:0/1 0 54s
Go back to Using the SQream-start Commands.
Starting the Sqream Studio UI
You can run the following command the to start the SQream Studio UI (Editor and Dashboard):
$ ./sqream-start -a
The following is an example of the correct output:
$ ./sqream-start -a
Please enter USERNAME:
sqream
Please enter PASSWORD:
******
Please enter port value or press ENTER to keep 8080:
Starting up SQream Admin UI...
SQream admin ui is available at 192.168.5.100:8080
Go back to Using the SQream-start Commands.
Stopping the SQream Services
You can run the following command to stop all SQream services:
$ ./sqream-start -d
The following is an example of the correct output:
$ ./sqream-start -d
$ Cleaning all SQream services in sqream namespace ...
$ All SQream service removed from sqream namespace
Go back to Using the SQream-start Commands.
Advanced sqream-start Commands
Controlling Your SQream Spool Size
If you do not specify the SQream spool size, the console automatically distributes the available RAM between all running workers.
You can define a specific spool size by running the following command:
$ ./sqream-start -s -m 4
Using a Custom .json File
You have the option of using your own .json file for your own custom configurations. Your .json file must be placed within the path mounted in the installation. SQream recommends placing your .json file in the configuration folder.
The SQream console does not validate the integrity of external .json files.
You can use the following command (using the j
flag) to set the full path of your .json file to the configuration file:
$ ./sqream-start -s -f <full path>.json
This command starts one worker with an external configuration file.
Note
The configuration file must be available in the shared configuration folder.
Checking the Status of the SQream Services
You can show all running SQream services by running the following command:
$ kubectl get pods -n <namespace> -o wide
This command shows all running services in the cluster and which nodes they are running in.
Go back to Using the SQream-start Commands.
Upgrading Your SQream Version
The Upgrading Your SQream Version section describes the following:
Before Upgrading Your System
Before upgrading your system you must do the following:
Contact SQream Support for a new SQream package tarball file.
Set a maintenance window.
Note
You must stop the system while upgrading it.
Upgrading Your System
After completing the steps in Before Upgrading Your System above, you can upgrade your system.
To upgrade your system:
Extract the contents of the tarball file that you received from SQream support. Make sure to extract the contents to the same directory as in Getting the SQream Package and for the same user:
$ tar -xvf sqream_installer-2.0.5-DB2019.2.1-CO1.6.3-ED3.0.0-x86_64/ $ cd sqream_installer-2.0.5-DB2019.2.1-CO1.6.3-ED3.0.0-x86_64/
Start the upgrade process run the following command:
$ ./sqream-install -i
The upgrade process checks if the SQream services are running and will prompt you to stop them.
Do one of the following:
Stop the upgrade by writing
No
.Continue the upgrade by writing
Yes
.
If you continue upgrading, all running SQream workers (master and editor) are stopped. When all services have been stopped, the new version is loaded.
Note
SQream periodically upgrades its metadata structure. If an upgrade version includes an upgraded metadata service, an approval request message is displayed. This approval is required to finish the upgrade process. Because SQream supports only specific metadata versions, all SQream services must be upgraded at the same time.
When SQream has successfully upgraded, load the SQream console and restart your services.
For questions, contact SQream Support.
Installing Monit
Getting Started
Before installing SQream with Monit, verify that you have followed the required recommended pre-installation configurations.
The procedures in the Installing Monit guide must be performed on each SQream cluster node.
Overview
Monit is a free open source supervision utility for managing and monitoring Unix and Linux. Monit lets you view system status directly from the command line or from a native HTTP web server. Monit can be used to conduct automatic maintenance and repair, such as executing meaningful causal actions in error situations.
SQream uses Monit as a watchdog utility, but you can use any other utility that provides the same or similar functionality.
The Installing Monit procedures describes how to install, configure, and start Monit.
You can install Monit in one of the following ways:
Installing Monit on CentOS:
To install Monit on CentOS:
Install Monit as a superuser on CentOS:
$ sudo yum install monit
Installing Monit on CentOS Offline:
Installing Monit on CentOS offline can be done in either of the following ways:
Building Monit from Source Code
To build Monit from source code:
Copy the Monit package for the current version:
$ tar zxvf monit-<x.y.z>.tar.gz
The value
x.y.z
denotes the version numbers.
Navigate to the directory where you want to store the package:
$ cd monit-x.y.z
Configure the files in the package:
$ ./configure (use ./configure --help to view available options)
Build and install the package:
$ make && make install
The following are the default storage directories:
The Monit package: /usr/local/bin/
The monit.1 man-file: /usr/local/man/man1/
Optional - To change the above default location(s), use the –prefix option to ./configure.
Optional - Create an RPM package for CentOS directly from the source code:
$ rpmbuild -tb monit-x.y.z.tar.gz
Building Monit from Pre-Built Binaries
To build Monit from pre-built binaries:
Copy the Monit package for the current version:
$ tar zxvf monit-x.y.z-linux-x64.tar.gz
The value
x.y.z
denotes the version numbers.Navigate to the directory where you want to store the package:
Copy the bin/monit and /usr/local/bin/ directories:
$ cp bin/monit /usr/local/bin/
Copy the conf/monitrc and /etc/ directories:
$ cp conf/monitrc /etc/
For examples of pre-built Monit binarties, see Download Precompiled Binaries.
Installing Monit on Ubuntu:
To install Monit on Ubuntu:
Install Monit as a superuser on Ubuntu:
$ sudo apt-get install monit
Installing Monit on Ubuntu Offline:
You can install Monit on Ubuntu when you do not have an internet connection.
To install Monit on Ubuntu offline:
Compress the required file:
$ tar zxvf monit-<x.y.z>-linux-x64.tar.gz
NOTICE: <x.y.z> denotes the version number.
Navigate to the directory where you want to save the file:
$ cd monit-x.y.z
Copy the bin/monit directory into the /usr/local/bin/ directory:
$ cp bin/monit /usr/local/bin/
Copy the conf/monitrc directory into the /etc/ directory:
$ cp conf/monitrc /etc/
Configuring Monit
When the installation is complete, you can configure Monit. You configure Monit by modifying the Monit configuration file, called monitrc. This file contains blocks for each service that you want to monitor.
The following is an example of a service block:
$ #SQREAM1-START $ check process sqream1 with pidfile /var/run/sqream1.pid $ start program = "/usr/bin/systemctl start sqream1" $ stop program = "/usr/bin/systemctl stop sqream1" $ #SQREAM1-END
For example, if you have 16 services, you can configure this block by copying the entire block 15 times and modifying all service names as required, as shown below:
$ #SQREAM2-START $ check process sqream2 with pidfile /var/run/sqream2.pid $ start program = "/usr/bin/systemctl start sqream2" $ stop program = "/usr/bin/systemctl stop sqream2" $ #SQREAM2-END
For servers that don’t run the metadataserver and serverpicker commands, you can use the block example above, but comment out the related commands, as shown below:
$ #METADATASERVER-START $ #check process metadataserver with pidfile /var/run/metadataserver.pid $ #start program = "/usr/bin/systemctl start metadataserver" $ #stop program = "/usr/bin/systemctl stop metadataserver" $ #METADATASERVER-END
To configure Monit:
Copy the required block for each required service.
Modify all service names in the block.
Copy the configured monitrc file to the /etc/monit.d/ directory:
$ cp monitrc /etc/monit.d/
Set file permissions to 600 (full read and write access):
$ sudo chmod 600 /etc/monit.d/monitrc
Reload the system to activate the current configurations:
$ sudo systemctl daemon-reload
Optional - Navigate to the /etc/sqream directory and create a symbolic link to the monitrc file:
$ cd /etc/sqream $ sudo ln -s /etc/monit.d/monitrc monitrc
Starting Monit
After configuring Monit, you can start it.
To start Monit:
Start Monit as a super user:
$ sudo systemctl start monit
View Monit’s service status:
$ sudo systemctl status monit
If Monit is functioning correctly, enable the Monit service to start on boot:
$ sudo systemctl enable monit
Launching SQream with Monit
This procedure describes how to launch SQream using Monit.
Launching SQream
After doing the following, you can launch SQream according to the instructions on this page.
The following is an example of a working monitrc file configured to monitor the *metadataserver and serverpicker commands, and four sqreamd services. The monitrc configuration file is located in the conf/monitrc directory.
Note that the monitrc in the following example is configured for eight sqreamd
services, but that only the first four are enabled:
$ set daemon 5 # check services at 30 seconds intervals
$ set logfile syslog
$
$ set httpd port 2812 and
$ use address localhost # only accept connection from localhost
$ allow localhost # allow localhost to connect to the server and
$ allow admin:monit # require user 'admin' with password 'monit'
$
$ ##set mailserver smtp.gmail.com port 587
$ ## using tlsv12
$ #METADATASERVER-START
$ check process metadataserver with pidfile /var/run/metadataserver.pid
$ start program = "/usr/bin/systemctl start metadataserver"
$ stop program = "/usr/bin/systemctl stop metadataserver"
$ #METADATASERVER-END
$ # alert user@domain.com on {nonexist, timeout}
$ # with mail-format {
$ # from: Monit@$HOST
$ # subject: metadataserver $EVENT - $ACTION
$ # message: This is an automate mail, sent from monit.
$ # }
$ #SERVERPICKER-START
$ check process serverpicker with pidfile /var/run/serverpicker.pid
$ start program = "/usr/bin/systemctl start serverpicker"
$ stop program = "/usr/bin/systemctl stop serverpicker"
$ #SERVERPICKER-END
$ # alert user@domain.com on {nonexist, timeout}
$ # with mail-format {
$ # from: Monit@$HOST
$ # subject: serverpicker $EVENT - $ACTION
$ # message: This is an automate mail, sent from monit.
$ #
$ #
$ #SQREAM1-START
$ check process sqream1 with pidfile /var/run/sqream1.pid
$ start program = "/usr/bin/systemctl start sqream1"
$ stop program = "/usr/bin/systemctl stop sqream1"
$ #SQREAM1-END
$ # alert user@domain.com on {nonexist, timeout}
$ # with mail-format {
$ # from: Monit@$HOST
$ # subject: sqream1 $EVENT - $ACTION
$ # message: This is an automate mail, sent from monit.
$ # }
$ #SQREAM2-START
$ check process sqream2 with pidfile /var/run/sqream2.pid
$ start program = "/usr/bin/systemctl start sqream2"
$ #SQREAM2-END
$ # alert user@domain.com on {nonexist, timeout}
$ # with mail-format {
$ # from: Monit@$HOST
$ # subject: sqream1 $EVENT - $ACTION
$ # message: This is an automate mail, sent from monit.
$ # }
$ #SQREAM3-START
$ check process sqream3 with pidfile /var/run/sqream3.pid
$ start program = "/usr/bin/systemctl start sqream3"
$ stop program = "/usr/bin/systemctl stop sqream3"
$ #SQREAM3-END
$ # alert user@domain.com on {nonexist, timeout}
$ # with mail-format {
$ # from: Monit@$HOST
$ # subject: sqream2 $EVENT - $ACTION
$ # message: This is an automate mail, sent from monit.
$ # }
$ #SQREAM4-START
$ check process sqream4 with pidfile /var/run/sqream4.pid
$ start program = "/usr/bin/systemctl start sqream4"
$ stop program = "/usr/bin/systemctl stop sqream4"
$ #SQREAM4-END
$ # alert user@domain.com on {nonexist, timeout}
$ # with mail-format {
$ # from: Monit@$HOST
$ # subject: sqream2 $EVENT - $ACTION
$ # message: This is an automate mail, sent from monit.
$ # }
$ #
$ #SQREAM5-START
$ #check process sqream5 with pidfile /var/run/sqream5.pid
$ #start program = "/usr/bin/systemctl start sqream5"
$ #stop program = "/usr/bin/systemctl stop sqream5"
$ #SQREAM5-END
$ # alert user@domain.com on {nonexist, timeout}
$ # with mail-format {
$ # from: Monit@$HOST
$ # subject: sqream2 $EVENT - $ACTION
$ # message: This is an automate mail, sent from monit.
$ # }
$ #
$ #SQREAM6-START
$ #check process sqream6 with pidfile /var/run/sqream6.pid
$ #start program = "/usr/bin/systemctl start sqream6"
$ #stop program = "/usr/bin/systemctl stop sqream6"
$ #SQREAM6-END
$ # alert user@domain.com on {nonexist, timeout}
$ # with mail-format {
$ # from: Monit@$HOST
$ # subject: sqream2 $EVENT - $ACTION
$ # message: This is an automate mail, sent from monit.
$ # }
$ #
$ #SQREAM7-START
$ #check process sqream7 with pidfile /var/run/sqream7.pid
$ #start program = "/usr/bin/systemctl start sqream7"
$ #stop program = "/usr/bin/systemctl stop sqream7"
$ #SQREAM7-END
$ # with mail-format {
$ # from: Monit@$HOST
$ # subject: sqream2 $EVENT - $ACTION
$ # message: This is an automate mail, sent from monit.
$ # }
$ #
$ #SQREAM8-START
$ #check process sqream8 with pidfile /var/run/sqream8.pid
$ #start program = "/usr/bin/systemctl start sqream8"
$ #stop program = "/usr/bin/systemctl stop sqream8"
$ #SQREAM8-END
$ # alert user@domain.com on {nonexist, timeout}
$ # with mail-format {
$ # from: Monit@$HOST
$ # subject: sqream2 $EVENT - $ACTION
$ # message: This is an automate mail, sent from monit.
$ # }
Monit Usage Examples
This section shows examples of two methods for stopping the sqream3 service use Monit’s command syntax:
Stopping Monit and SQream Separately
You can stop the Monit service and SQream separately as follows:
$ sudo systemctl stop monit
$ sudo systemctl stop sqream3
You can restart Monit as follows:
$ sudo systemctl start monit
Restarting Monit automatically restarts the SQream services.
Stopping SQream Using a Monit Command
You can stop SQream using a Monit command as follows:
$ sudo monit stop sqream3
This command stops SQream only (and not Monit).
You can restart SQream as follows:
$ sudo monit start sqream3
Monit Command Line Options
The Monit Command Line Options section describes some of the most commonly used Monit command options.
You can show the command line options by running:
$ monit --help
$ start all - Start all services
$ start <name> - Only start the named service
$ stop all - Stop all services
$ stop <name> - Stop the named service
$ restart all - Stop and start all services
$ restart <name> - Only restart the named service
$ monitor all - Enable monitoring of all services
$ monitor <name> - Only enable monitoring of the named service
$ unmonitor all - Disable monitoring of all services
$ unmonitor <name> - Only disable monitoring of the named service
$ reload - Reinitialize monit
$ status [name] - Print full status information for service(s)
$ summary [name] - Print short status information for service(s)
$ report [up|down|..] - Report state of services. See manual for options
$ quit - Kill the monit daemon process
$ validate - Check all services and start if not running
$ procmatch <pattern> - Test process matching pattern
Using Monit While Upgrading Your Version of SQream
While upgrading your version of SQream, you can use Monit to avoid conflicts (such as service start). This is done by pausing or stopping all running services while you manually upgrade SQream. When you finish successfully upgrading SQream, you can use Monit to restart all SQream services
To use Monit while upgrading your version of SQream:
Stop all actively running SQream services:
$ sudo monit stop all
Verify that SQream has stopped listening on ports 500X, 510X, and 310X:
$ sudo netstat -nltp #to make sure sqream stopped listening on 500X, 510X and 310X ports.
The example below shows the old version
sqream-db-v2020.2
being replaced with the new versionsqream-db-v2025.200
.$ cd /home/sqream $ mkdir tempfolder $ mv sqream-db-v2025.200.tar.gz tempfolder/ $ tar -xf sqream-db-v2025.200.tar.gz $ sudo mv sqream /usr/local/sqream-db-v2025.200 $ cd /usr/local $ sudo chown -R sqream:sqream sqream-db-v2025.200 $ sudo rm sqream #This only should remove symlink $ sudo ln -s sqream-db-v2025.200 sqream #this will create new symlink named "sqream" pointing to new version $ ls -l
The symbolic SQream link should point to the real folder:
$ sqream -> sqream-db-v2025.200
Restart the SQream services:
$ sudo monit start all
Verify that the latest version has been installed:
$ SELECT SHOW_VERSION();
The correct version is output.
Restart the UI:
$ pm2 start all
Installing SQream Studio
The Installing SQream Studio page incudes the following installation guides:
Installing Prometheus Exporter
The Installing Prometheus Exporters guide includes the following sections:
Overview
The Prometheus exporter is an open-source systems monitoring and alerting toolkit. It is used for collecting metrics from an operating system and exporting them to a graphic user interface.
The Installing Prometheus Exporters guide describes how to installing the following exporters:
The Node_exporter - the basic exporter used for displaying server metrics, such as CPU and memory.
The Nvidia_exporter - shows Nvidia GPU metrics.
The process_exporter - shows data belonging to the server’s running processes.
For information about more exporters, see Exporters and Integration
Adding a User and Group
Adding a user and group determines who can run processes.
You can add users with the following command:
$ sudo groupadd --system prometheus
You can add groups with the following command:
$ sudo useradd -s /sbin/nologin --system -g prometheus prometheus
Cloning the Prometheus GIT Project
After adding a user and group you must clone the Prometheus GIT project.
You can clone the Prometheus GIT project with the following command:
$ git clone http://gitlab.sq.l/IT/promethues.git prometheus
Note
If you experience difficulties cloning the Prometheus GIT project or receive an error, contact your IT department.
The following shows the result of cloning your Prometheus GIT project:
$ prometheus/
$ ├── node_exporter
$ │ └── node_exporter
$ ├── nvidia_exporter
$ │ └── nvidia_exporter
$ ├── process_exporter
$ │ └── process-exporter_0.5.0_linux_amd64.rpm
$ ├── README.md
$ └── services
$ ├── node_exporter.service
$ └── nvidia_exporter.service
Installing the Node Exporter and NVIDIA Exporter
After cloning the Prometheus GIT project you must install the node_exporter and NVIDIA_exporter.
To install the node_exporter and NVIDIA_exporter:
Navigate to the cloned folder:
$ cd prometheus
Copy node_exporter and nvidia_exporter to /usr/bin/.
$ sudo cp node_exporter/node_exporter /usr/bin/ $ sudo cp nvidia_exporter/nvidia_exporter /usr/bin/
Copy the services files to the services folder:
$ sudo cp services/node_exporter.service /etc/systemd/system/ $ sudo cp services/nvidia_exporter.service /etc/systemd/system/
Reload the services so that they can be run:
$ sudo systemctl daemon-reload
Set the permissions and group for both service files:
$ sudo chown prometheus:prometheus /usr/bin/node_exporter $ sudo chmod u+x /usr/bin/node_exporter $ sudo chown prometheus:prometheus /usr/bin/nvidia_exporter $ sudo chmod u+x /usr/bin/nvidia_exporter
Start both services:
$ sudo systemctl start node_exporter && sudo systemctl enable node_exporter
Set both services to start automatically when the server is booted up:
$ sudo systemctl start nvidia_exporter && sudo systemctl enable nvidia_exporter
Verify that the server’s status is active (running):
$ sudo systemctl status node_exporter && sudo systemctl status nvidia_exporter
The following is the correct output:
$ ● node_exporter.service - Node Exporter $ Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: disabled) $ Active: active (running) since Wed 2019-12-11 12:28:31 IST; 1 months 5 days ago $ Main PID: 28378 (node_exporter) $ CGroup: /system.slice/node_exporter.service $ $ ● nvidia_exporter.service - Nvidia Exporter $ Loaded: loaded (/etc/systemd/system/nvidia_exporter.service; enabled; vendor preset: disabled) $ Active: active (running) since Wed 2020-01-22 13:40:11 IST; 31min ago $ Main PID: 1886 (nvidia_exporter) $ CGroup: /system.slice/nvidia_exporter.service $ └─1886 /usr/bin/nvidia_exporter
Installing the Process Exporter
After installing the node_exporter and Nvidia_exporter you must install the process_exporter.
To install the process_exporter:
Do one of the following:
For CentOS, run
sudo rpm -i process_exporter/process-exporter_0.5.0_linux_amd64.rpm
.For Ubuntu, run
sudo dpkg -i process_exporter/process-exporter_0.6.0_linux_amd64.deb
.
Verify that the process_exporter is running:
$ sudo systemctl status process-exporter
Set the process_exporter to start automatically when the server is booted up:
$ sudo systemctl enable process-exporter
Opening the Firewall Ports
After installing the process_exporter you must open the firewall ports for the following services:
node_exporter - port: 9100
nvidia_exporter - port: 9445
process-exporter - port: 9256
Note
This procedure is only relevant if your firwall is running.
To open the firewall ports:
Run the following command:
$ sudo firewall-cmd --zone=public --add-port=<PORT NUMBER>/tcp --permanent
Reload the firewall:
$ sudo firewall-cmd --reload
Verify that the changes have taken effect.
Installing Prometheus Using Binary Packages
Prometheus is an application used for event monitoring and alerting.
Installing Prometheus
You must install Prometheus before installing the Dashboard Data Collector.
To install Prometheus:
Verify the following:
That you have sudo access to your Linux server.
That your server has access to the internet (for downloading the Prometheus binary package).
That your firewall rules are opened for accessing Prometheus Port 9090.
Navigate to the Prometheus Download page and download the prometheus-2.32.0-rc.1.linux-amd64.tar.gz package.
Do the following:
Download the source using the
curl
command:$ curl -LO url -LO https://github.com/prometheus/prometheus/releases/download/v2.22.0/prometheus-2.22.0.linux-amd64.tar.gz
Extract the file contents:
$ tar -xvf prometheus-2.22.0.linux-amd64.tar.gz
Rename the extracted folder prometheus-files:
$ mv prometheus-2.22.0.linux-amd64 prometheus-files
Create a Prometheus user:
$ sudo useradd --no-create-home --shell /bin/false prometheus
Create your required directories:
$ sudo mkdir /etc/prometheus $ sudo mkdir /var/lib/prometheus
Set the Prometheus user as the owner of your required directories:
$ sudo chown prometheus:prometheus /etc/prometheus $ sudo chown prometheus:prometheus /var/lib/prometheus
Copy the Prometheus and Promtool binary packages from the prometheus-files folder to /usr/local/bin:
$ sudo cp prometheus-files/prometheus /usr/local/bin/ $ sudo cp prometheus-files/promtool /usr/local/bin/
Change the ownership to the prometheus user:
$ sudo chown prometheus:prometheus /usr/local/bin/prometheus $ sudo chown prometheus:prometheus /usr/local/bin/promtool
Move the consoles and consoles_libraries directories from prometheus-files folder to /etc/prometheus folder:
$ sudo cp -r prometheus-files/consoles /etc/prometheus $ sudo cp -r prometheus-files/console_libraries /etc/prometheus
Change the ownership to the prometheus user:
$ sudo chown -R prometheus:prometheus /etc/prometheus/consoles $ sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
For more information on installing the Dashboard Data Collector, see Installing the Dashboard Data Collector.
Configuring Your Prometheus Settings
After installing Prometheus you must configure your Prometheus settings. You must perform all Prometheus configurations in the /etc/prometheus/prometheus.yml file.
To configure your Prometheus settings:
Create your prometheus.yml file:
$ sudo vi /etc/prometheus/prometheus.yml
Copy the contents below into your prometheus.yml file:
$ #node_exporter port : 9100 $ #nvidia_exporter port: 9445 $ #process-exporter port: 9256 $ $ global: $ scrape_interval: 10s $ $ scrape_configs: $ - job_name: 'prometheus' $ scrape_interval: 5s $ static_configs: $ - targets: $ - <prometheus server IP>:9090 $ - job_name: 'processes' $ scrape_interval: 5s $ static_configs: $ - targets: $ - <process exporters iP>:9256 $ - <another process exporters iP>:9256 $ - job_name: 'nvidia' $ scrape_interval: 5s $ static_configs: $ - targets: $ - <nvidia exporter IP>:9445 $ - <another nvidia exporter IP>:9445 $ - job_name: 'nodes' $ scrape_interval: 5s $ static_configs: $ - targets: $ - <node exporter IP>:9100 $ - <another node exporter IP>:9100
Change the ownership of the file to the prometheus user:
$ sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
Configuring Your Prometheus Service File
After configuring your Prometheus settings you must configure your Prometheus service file.
To configure your Prometheus service file:
Create your prometheus.yml file:
$ sudo vi /etc/systemd/system/prometheus.service
Copy the contents below into your prometheus service file:
$ [Unit] $ Description=Prometheus $ Wants=network-online.target $ After=network-online.target $ $ [Service] $ User=prometheus $ Group=prometheus $ Type=simple $ ExecStart=/usr/local/bin/prometheus \ $ --config.file /etc/prometheus/prometheus.yml \ $ --storage.tsdb.path /var/lib/prometheus/ \ $ --web.console.templates=/etc/prometheus/consoles \ $ --web.console.libraries=/etc/prometheus/console_libraries $ $ [Install] $ WantedBy=multi-user.target
Register the prometheus service by reloading the systemd service:
$ sudo systemctl daemon-reload
Start the prometheus service:
$ sudo systemctl start prometheus
Check the status of the prometheus service:
$ sudo systemctl status prometheus
If the status is
active (running)
, you have configured your Prometheus service file correctly.
Accessing the Prometheus User Interface
After configuring your Prometheus service file, you can access the Prometheus user interface.
You can access the Prometheus user interface by running the following command:
$ http://<prometheus-ip>:9090/graph
Once the Prometheus user interface is displayed, go to the Query tab and query metrics.
Installing the Dashboard Data Collector
Installing the Dashboard Data Collector
After accessing the Prometheus user interface, you can install the Dashboard Data Collector. You must install the Dashboard Data Collector to enable the Dashboard in Studio.
Note
Before installing the Dashboard Data collector, verify that Prometheus has been installed and configured for the cluster.
How to install Prometheus from tarball - Comment - this needs to be its own page.
To install the Dashboard Data Collector:
Store the Data Collector Package obtained from SQream Artifactory.
Extract and rename the package:
$ tar -xvf dashboard-data-collector-0.5.2.tar.gz $ mv package dashboard-data-collector
Change your directory to the location of the package folder:
$ cd dashboard-data-collector
Set up the data collection by modifying the SQream and Data Collector IPs, ports, user name, and password according to the cluster:
$ npm run setup -- \ $ --host=127.0.0.1 \ $ --port=3108 \ $ --database=master \ $ --is-cluster=true \ $ --service=sqream \ $ --dashboard-user=sqream \ $ --dashboard-password=sqream \ $ --prometheus-url=http://127.0.0.1:9090/api/v1/query
Debug the Data Collector: (Comment - using the npm project manager).
$ npm start
A json file is generated in the log, as shown below:
$ { $ "machines": [ $ { $ "machineId": "dd4af489615", $ "name": "Server 0", $ "location": "192.168.4.94", $ "totalMemory": 31.19140625, $ "gpus": [ $ { $ "gpuId": "GPU-b17575ec-eeba-3e0e-99cd-963967e5ee3f", $ "machineId": "dd4af489615", $ "name": "GPU 0", $ "totalMemory": 3.9453125 $ } $ ], $ "workers": [ $ { $ "workerId": "sqream_01", $ "gpuId": "", $ "name": "sqream_01" $ } $ ], $ "storageWrite": 0, $ "storageRead": 0, $ "freeStorage": 0 $ }, $ { $ "machineId": "704ec607174", $ "name": "Server 1", $ "location": "192.168.4.95", $ "totalMemory": 31.19140625, $ "gpus": [ $ { $ "gpuId": "GPU-8777c14f-7611-517a-e9c7-f42eeb21700b", $ "machineId": "704ec607174", $ "name": "GPU 0", $ "totalMemory": 3.9453125 $ } $ ], $ "workers": [ $ { $ "workerId": "sqream_02", $ "gpuId": "", $ "name": "sqream_02" $ } $ ], $ "storageWrite": 0, $ "storageRead": 0, $ "freeStorage": 0 $ } $ ], $ "clusterStatus": true, $ "storageStatus": { $ "dataStorage": 49.9755859375, $ "totalDiskUsage": 52.49829018075231, $ "storageDetails": { $ "data": 0, $ "freeData": 23.7392578125, $ "tempData": 0, $ "deletedData": 0, $ "other": 26.236328125 $ }, $ "avgThroughput": { $ "read": 0, $ "write": 0 $ }, $ "location": "/" $ }, $ "queues": [ $ { $ "queueId": "sqream", $ "name": "sqream", $ "workerIds": [ $ "sqream_01", $ "sqream_02" $ ] $ } $ ], $ "queries": [], $ "collected": true, $ "lastCollect": "2021-11-17T12:46:31.601Z" $ }
Note
Verify that all machines and workers are correctly registered.
Press CTRL + C to stop
npm start
(Comment - It may be better to refer to it as the npm project manager).
Start the Data Collector with the
pm2
service:$ pm2 start ./index.js --name=dashboard-data-collector
Add the following parameter to the SQream Studio setup defined in Step 4 in Installing Studio below.
--data-collector-url=http://127.0.0.1:8100/api/dashboard/data
Installing Studio on a Stand-Alone Server
A stand-alone server is a server that does not run SQream based on binary files.
The Installing Studio on a Stand-Alone Server guide includes the following sections:
Installing NodeJS Version 12 on the Server
Before installing Studio you must install NodeJS version 12 on the server.
To install NodeJS version 12 on the server:
Check if a version of NodeJS older than version 12.<x.x> has been installed on the target server.
$ node -v
The following is the output if a version of NodeJS has already been installed on the target server:
bash: /usr/bin/node: No such file or directory
If a version of NodeJS older than 12.<x.x> has been installed, remove it as follows:
On CentOS:
$ sudo yum remove -y nodejs
On Ubuntu:
$ sudo apt remove -y nodejs
If you have not installed NodeJS version 12, run the following commands:
On CentOS:
$ curl -sL https://rpm.nodesource.com/setup_12.x | sudo bash - $ sudo yum clean all && sudo yum makecache fast $ sudo yum install -y nodejs
On Ubuntu:
$ curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash - $ sudo apt-get install -y nodejs
The following output is displayed if your installation has completed successfully:
Transaction Summary ============================================================================================================================== Install 1 Package Total download size: 22 M Installed size: 67 M Downloading packages: warning: /var/cache/yum/x86_64/7/nodesource/packages/nodejs-12.22.1-1nodesource.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID 34fa74dd: NOKEY Public key for nodejs-12.22.1-1nodesource.x86_64.rpm is not installed nodejs-12.22.1-1nodesource.x86_64.rpm | 22 MB 00:00:02 Retrieving key from file:///etc/pki/rpm-gpg/NODESOURCE-GPG-SIGNING-KEY-EL Importing GPG key 0x34FA74DD: Userid : "NodeSource <gpg-rpm@nodesource.com>" Fingerprint: 2e55 207a 95d9 944b 0cc9 3261 5ddb e8d4 34fa 74dd Package : nodesource-release-el7-1.noarch (installed) From : /etc/pki/rpm-gpg/NODESOURCE-GPG-SIGNING-KEY-EL Running transaction check Running transaction test Transaction test succeeded Running transaction Warning: RPMDB altered outside of yum. Installing : 2:nodejs-12.22.1-1nodesource.x86_64 1/1 Verifying : 2:nodejs-12.22.1-1nodesource.x86_64 1/1 Installed: nodejs.x86_64 2:12.22.1-1nodesource Complete!
Confirm the Node version.
$ node -v
The following is an example of the correct output:
v12.22.1
Install Prometheus using binary packages.
For more information on installing Prometheus using binary packages, see Installing Prometheus Using Binary Packages.
Installing Studio
After installing the Dashboard Data Collector, you can install Studio.
To install Studio:
Copy the SQream Studio package from SQream Artifactory into the target server. For access to the Sqream Studio package, contact SQream Support.
Extract the package:
$ tar -xvf sqream-acceleration-studio-<version number>.x86_64.tar.gz
Navigate to the new package folder.
$ cd sqream-admin
Build the configuration file to set up Sqream Studio. You can use IP address 127.0.0.1 on a single server.
$ npm run setup -- -y --host=<SQreamD IP> --port=3108 --data-collector-url=http://<data collector IP address>:8100/api/dashboard/data
The above command creates the sqream-admin-config.json configuration file in the sqream-admin folder and shows the following output:
Config generated successfully. Run `npm start` to start the app.
For more information about the available set-up arguments, see Set-Up Arguments.
To access Studio over a secure connection, in your configuration file do the following:
Change your
port
value to 3109.Change your
ssl
flag value to true.The following is an example of the correctly modified configuration file:
{ "debugSqream": false, "webHost": "localhost", "webPort": 8080, "webSslPort": 8443, "logsDirectory": "", "clusterType": "standalone", "dataCollectorUrl": "", "connections": [ { "host": "127.0.0.1", "port":3109, "isCluster": true, "name": "default", "service": "sqream", "ssl":true, "networkTimeout": 60000, "connectionTimeout": 3000 } ] }
If you have installed Studio on a server where SQream is already installed, move the sqream-admin-config.json file to /etc/sqream/:
$ mv sqream-admin-config.json /etc/sqream
Starting Studio Manually
You can start Studio manually by running the following command:
$ cd /home/sqream/sqream-admin
$ NODE_ENV=production pm2 start ./server/build/main.js --name=sqream-studio -- start
The following output is displayed:
[PM2] Starting /home/sqream/sqream-admin/server/build/main.js in fork_mode (1 instance)
[PM2] Done.
┌─────┬──────────────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id │ name │ namespace │ version │ mode │ pid │ uptime │ ↺ │ status │ cpu │ mem │ user │ watching │
├─────┼──────────────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤
│ 0 │ sqream-studio │ default │ 0.1.0 │ fork │ 11540 │ 0s │ 0 │ online │ 0% │ 15.6mb │ sqream │ disabled │
└─────┴──────────────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘
Starting Studio as a Service
Sqream uses the Process Manager (PM2) to maintain Studio.
To start Studio as a service:
Run the following command:
$ sudo npm install -g pm2
Verify that the PM2 has been installed successfully.
$ pm2 list
The following is the output:
┌─────┬──────────────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐ │ id │ name │ namespace │ version │ mode │ pid │ uptime │ ↺ │ status │ cpu │ mem │ user │ watching │ ├─────┼──────────────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤ │ 0 │ sqream-studio │ default │ 0.1.0 │ fork │ 11540 │ 2m │ 0 │ online │ 0% │ 31.5mb │ sqream │ disabled │ └─────┴──────────────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘
Start the service with PM2:
If the sqream-admin-config.json file is located in /etc/sqream/, run the following command:
$ cd /home/sqream/sqream-admin $ NODE_ENV=production pm2 start ./server/build/main.js --name=sqream-studio -- start --config-location=/etc/sqream/sqream-admin-config.json
If the sqream-admin-config.json file is not located in /etc/sqream/, run the following command:
$ cd /home/sqream/sqream-admin $ NODE_ENV=production pm2 start ./server/build/main.js --name=sqream-studio -- start
Verify that Studio is running.
$ netstat -nltp
Verify that SQream_studio is listening on port 8080, as shown below:
(Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN - tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN - tcp6 0 0 :::8080 :::* LISTEN 11540/sqream-studio tcp6 0 0 :::22 :::* LISTEN - tcp6 0 0 ::1:25 :::* LISTEN -
Verify the following:
That you can access Studio from your browser (
http://<IP_Address>:8080
).
That you can log in to SQream.
Save the configuration to run on boot.
$ pm2 startup
The following is an example of the output:
$ sudo env PATH=$PATH:/usr/bin /usr/lib/node_modules/pm2/bin/pm2 startup systemd -u sqream --hp /home/sqream
Copy and paste the output above and run it.
Save the configuration.
$ pm2 save
Accessing Studio
The Studio page is available on port 8080: http://<server ip>:8080
.
If port 8080 is blocked by the server firewall, you can unblock it by running the following command:
$ firewall-cmd --zone=public --add-port=8080/tcp --permanent $ firewall-cmd --reload
Maintaining Studio with the Process Manager (PM2)
Sqream uses the Process Manager (PM2) to maintain Studio.
You can use PM2 to do one of the following:
To check the PM2 service status:
pm2 list
To restart the PM2 service:
pm2 reload sqream-studio
To see the PM2 service logs:
pm2 logs sqream-studio
Upgrading Studio
To upgrade Studio you need to stop the version that you currently have.
To stop the current version of Studio:
List the process name:
$ pm2 list
The process name is displayed.
<process name>
Run the following command with the process name:
$ pm2 stop <process name>
If only one process is running, run the following command:
$ pm2 stop all
Change the name of the current sqream-admin folder to the old version.
$ mv sqream-admin sqream-admin-<old_version>
Extract the new Studio version.
$ tar -xf sqream-acceleration-studio-<version>tar.gz
Rebuild the configuration file. You can use IP address 127.0.0.1 on a single server.
$ npm run setup -- -y --host=<SQreamD IP> --port=3108
The above command creates the sqream-admin-config.json configuration file in the sqream_admin folder.
Copy the sqream-admin-config.json configuration file to /etc/sqream/ to overwrite the old configuration file.
Start PM2.
$ pm2 start all
Installing an NGINX Proxy Over a Secure Connection
Configuring your NGINX server to use a strong encryption for client connections provides you with secure servers requests, preventing outside parties from gaining access to your traffic.
The Installing an NGINX Proxy Over a Secure Connection page describes the following:
Overview
The Node.js platform that SQream uses with our Studio user interface is susceptible to web exposure. This page describes how to implement HTTPS access on your proxy server to establish a secure connection.
TLS (Transport Layer Security), and its predecessor SSL (Secure Sockets Layer), are standard web protocols used for wrapping normal traffic in a protected, encrypted wrapper. This technology prevents the interception of server-client traffic. It also uses a certificate system for helping users verify the identity of sites they visit. The Installing an NGINX Proxy Over a Secure Connection guide describes how to set up a self-signed SSL certificate for use with an NGINX web server on a CentOS 7 server.
Note
A self-signed certificate encrypts communication between your server and any clients. However, because it is not signed by trusted certificate authorities included with web browsers, you cannot use the certificate to automatically validate the identity of your server.
A self-signed certificate may be appropriate if your domain name is not associated with your server, and in cases where your encrypted web interface is not user-facing. If you do have a domain name, using a CA-signed certificate is generally preferrable.
For more information on setting up a free trusted certificate, see How To Secure Nginx with Let’s Encrypt on CentOS 7.
Prerequisites
The following prerequisites are required for installing an NGINX proxy over a secure connection:
Super user privileges
A domain name to create a certificate for
Installing NGINX and Adjusting the Firewall
After verifying that you have the above preriquisites, you must verify that the NGINX web server has been installed on your machine.
Though NGINX is not available in the default CentOS repositories, it is available from the EPEL (Extra Packages for Enterprise Linux) repository.
To install NGINX and adjust the firewall:
Enable the EPEL repository to enable server access to the NGINX package:
$ sudo yum install epel-release
Install NGINX:
$ sudo yum install nginx
Start the NGINX service:
$ sudo systemctl start nginx
Verify that the service is running:
$ systemctl status nginx
The following is an example of the correct output:
Output● nginx.service - The nginx HTTP and reverse proxy server Loaded: loaded (/usr/lib/systemd/system/nginx.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2017-01-06 17:27:50 UTC; 28s ago . . . Jan 06 17:27:50 centos-512mb-nyc3-01 systemd[1]: Started The nginx HTTP and reverse proxy server.
Enable NGINX to start when your server boots up:
$ sudo systemctl enable nginx
Verify that access to ports 80 and 443 are not blocked by a firewall.
Do one of the following:
If you are not using a firewall, skip to Creating Your SSL Certificate.
If you have a running firewall, open ports 80 and 443:
$ sudo firewall-cmd --add-service=http $ sudo firewall-cmd --add-service=https $ sudo firewall-cmd --runtime-to-permanent
If you have a running iptables firewall, for a basic rule set, add HTTP and HTTPS access:
$ sudo iptables -I INPUT -p tcp -m tcp --dport 80 -j ACCEPT $ sudo iptables -I INPUT -p tcp -m tcp --dport 443 -j ACCEPT
Note
The commands in Step 8 above are highly dependent on your current rule set.
Verify that you can access the default NGINX page from a web browser.
Creating Your SSL Certificate
After installing NGINX and adjusting your firewall, you must create your SSL certificate.
TLS/SSL combines public certificates with private keys. The SSL key, kept private on your server, is used to encrypt content sent to clients, while the SSL certificate is publicly shared with anyone requesting content. In addition, the SSL certificate can be used to decrypt the content signed by the associated SSL key. Your public certificate is located in the /etc/ssl/certs directory on your server.
This section describes how to create your /etc/ssl/private directory, used for storing your private key file. Because the privacy of this key is essential for security, the permissions must be locked down to prevent unauthorized access:
To create your SSL certificate:
Set the following permissions to private:
$ sudo mkdir /etc/ssl/private $ sudo chmod 700 /etc/ssl/private
Create a self-signed key and certificate pair with OpenSSL with the following command:
$ sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/ssl/private/nginx-selfsigned.key -out /etc/ssl/certs/nginx-selfsigned.crt
The following list describes the elements in the command above:
openssl - The basic command line tool used for creating and managing OpenSSL certificates, keys, and other files.
req - A subcommand for using the X.509 Certificate Signing Request (CSR) management. A public key infrastructure standard, SSL and TLS adhere X.509 key and certificate management regulations.
-x509 - Used for modifying the previous subcommand by overriding the default functionality of generating a certificate signing request with making a self-signed certificate.
-nodes - Sets OpenSSL to skip the option of securing our certificate with a passphrase, letting NGINX read the file without user intervention when the server is activated. If you don’t use -nodes you must enter your passphrase after every restart.
-days 365 - Sets the certificate’s validation duration to one year.
-newkey rsa:2048 - Simultaneously generates a new certificate and new key. Because the key required to sign the certificate was not created in the previous step, it must be created along with the certificate. The rsa:2048 generates an RSA 2048 bits long.
-keyout - Determines the location of the generated private key file.
-out - Determines the location of the certificate.
After creating a self-signed key and certificate pair with OpenSSL, a series of prompts about your server is presented to correctly embed the information you provided in the certificate.
Provide the information requested by the prompts.
The most important piece of information is the Common Name, which is either the server FQDN or your name. You must enter the domain name associated with your server or your server’s public IP address.
The following is an example of a filled out set of prompts:
OutputCountry Name (2 letter code) [AU]:US State or Province Name (full name) [Some-State]:New York Locality Name (eg, city) []:New York City Organization Name (eg, company) [Internet Widgits Pty Ltd]:Bouncy Castles, Inc. Organizational Unit Name (eg, section) []:Ministry of Water Slides Common Name (e.g. server FQDN or YOUR name) []:server_IP_address Email Address []:admin@your_domain.com
Both files you create are stored in their own subdirectories of the /etc/ssl directory.
Although SQream uses OpenSSL, in addition we recommend creating a strong Diffie-Hellman group, used for negotiating Perfect Forward Secrecy with clients.
Create a strong Diffie-Hellman group:
$ sudo openssl dhparam -out /etc/ssl/certs/dhparam.pem 2048
Creating a Diffie-Hellman group takes a few minutes, which is stored as the dhparam.pem file in the /etc/ssl/certs directory. This file can use in the configuration.
Configuring NGINX to use SSL
After creating your SSL certificate, you must configure NGINX to use SSL.
The default CentOS NGINX configuration is fairly unstructured, with the default HTTP server block located in the main configuration file. NGINX checks for files ending in .conf in the /etc/nginx/conf.d directory for additional configuration.
SQream creates a new file in the /etc/nginx/conf.d directory to configure a server block. This block serves content using the certificate files we generated. In addition, the default server block can be optionally configured to redirect HTTP requests to HTTPS.
Note
The example on this page uses the IP address 127.0.0.1, which you should replace with your machine’s IP address.
To configure NGINX to use SSL:
Create and open a file called ssl.conf in the /etc/nginx/conf.d directory:
$ sudo vi /etc/nginx/conf.d/ssl.conf
In the file you created in Step 1 above, open a server block:
Listen to port 443, which is the TLS/SSL default port.
Set the
server_name
to the server’s domain name or IP address you used as the Common Name when generating your certificate.Use the
ssl_certificate
,ssl_certificate_key
, andssl_dhparam
directives to set the location of the SSL files you generated, as shown in the /etc/nginx/conf.d/ssl.conf file below:
upstream ui { server 127.0.0.1:8080; } server { listen 443 http2 ssl; listen [::]:443 http2 ssl; server_name nginx.sq.l; ssl_certificate /etc/ssl/certs/nginx-selfsigned.crt; ssl_certificate_key /etc/ssl/private/nginx-selfsigned.key; ssl_dhparam /etc/ssl/certs/dhparam.pem; root /usr/share/nginx/html; # location / { # } location / { proxy_pass http://ui; proxy_set_header X-Forwarded-Proto https; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Real-IP $remote_addr; proxy_set_header Host $host; add_header Front-End-Https on; add_header X-Cache-Status $upstream_cache_status; proxy_cache off; proxy_cache_revalidate off; proxy_cache_min_uses 1; proxy_cache_valid 200 302 1h; proxy_cache_valid 404 3s; proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504; proxy_no_cache $cookie_nocache $arg_nocache $arg_comment $http_pragma $http_authorization; proxy_redirect default; proxy_max_temp_file_size 0; proxy_connect_timeout 90; proxy_send_timeout 90; proxy_read_timeout 90; proxy_buffer_size 4k; proxy_buffering on; proxy_buffers 4 32k; proxy_busy_buffers_size 64k; proxy_temp_file_write_size 64k; proxy_intercept_errors on; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; } error_page 404 /404.html; location = /404.html { } error_page 500 502 503 504 /50x.html; location = /50x.html { } }
Open and modify the nginx.conf file located in the /etc/nginx/conf.d directory as follows:
$ sudo vi /etc/nginx/conf.d/nginx.conf
server { listen 80; listen [::]:80; server_name _; root /usr/share/nginx/html; # Load configuration files for the default server block. include /etc/nginx/default.d/*.conf; error_page 404 /404.html; location = /404.html { } error_page 500 502 503 504 /50x.html; location = /50x.html { } }
Redirecting Studio Access from HTTP to HTTPS
After configuring NGINX to use SSL, you must redirect Studio access from HTTP to HTTPS.
According to your current configuration, NGINX responds with encrypted content for requests on port 443, but with unencrypted content for requests on port 80. This means that our site offers encryption, but does not enforce its usage. This may be fine for some use cases, but it is usually better to require encryption. This is especially important when confidential data like passwords may be transferred between the browser and the server.
The default NGINX configuration file allows us to easily add directives to the default port 80 server block by adding files in the /etc/nginx/default.d directory.
To create a redirect from HTTP to HTTPS:
Create a new file called ssl-redirect.conf and open it for editing:
$ sudo vi /etc/nginx/default.d/ssl-redirect.conf
Copy and paste this line:
$ return 301 https://$host$request_uri:8080/;
Activating Your NGINX Configuration
After redirecting from HTTP to HTTPs, you must restart NGINX to activate your new configuration.
To activate your NGINX configuration:
Verify that your files contain no syntax errors:
$ sudo nginx -t
The following output is generated if your files contain no syntax errors:
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok nginx: configuration file /etc/nginx/nginx.conf test is successful
Restart NGINX to activate your configuration:
$ sudo systemctl restart nginx
Verifying that NGINX is Running
After activating your NGINX configuration, you must verify that NGINX is running correctly.
To verify that NGINX is running correctly:
Check that the service is up and running:
$ systemctl status nginx
The following is an example of the correct output:
Output● nginx.service - The nginx HTTP and reverse proxy server Loaded: loaded (/usr/lib/systemd/system/nginx.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2017-01-06 17:27:50 UTC; 28s ago . . . Jan 06 17:27:50 centos-512mb-nyc3-01 systemd[1]: Started The nginx HTTP and reverse proxy server.
Run the following command:
$ sudo netstat -nltp |grep nginx
The following is an example of the correct output:
[sqream@dorb-pc etc]$ sudo netstat -nltp |grep nginx tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 15486/nginx: master tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 15486/nginx: master tcp6 0 0 :::80 :::* LISTEN 15486/nginx: master tcp6 0 0 :::443 :::* LISTEN 15486/nginx: master
Data Ingestion Sources
The Data Ingestion Sources page provides information about the following:
Ingesting Data Overview
The Ingesting Data Overview page provides basic information useful when ingesting data into SQream from a variety of sources and locations, and describes the following:
Getting Started
SQream supports ingesting data using the following methods:
Executing the
INSERT
statement using a client driver.Executing the
COPY FROM
statement or ingesting data from foreign tables:Local filesystem and locally mounted network filesystems
Ingesting Data using the Amazon S3 object storage service
Ingesting Data using an HDFS data storage system
SQream supports loading files from the following formats:
Text - CSV, TSV, and PSV
Parquet
ORC
Avro
JSON
For more information, see the following:
Using the
INSERT
statement - INSERTUsing client drivers - Client drivers
Using the
COPY FROM
statement - COPY FROMUsing the Amazon S3 object storage service - Amazon Web Services
Using the HDFS data storage system - HDFS Environment
Loading data from foreign tables - Foreign Tables
Data Loading Considerations
The Data Loading Considerations section describes the following:
Verifying Data and Performance after Loading
Like many RDBMSs, SQream recommends its own set of best practices for table design and query optimization. When using SQream, verify the following:
That your data is structured as you expect (row counts, data types, formatting, content).
That your query performance is adequate.
That you followed the table design best practices (Optimization and Best Practices).
That you’ve tested and verified that your applications work (such as Tableau).
That your data types have not been not over-provisioned.
File Soure Location when Loading
While you are loading data, you can use the COPY FROM
command to let statements run on any worker. If you are running multiple nodes, verify that all nodes can see the source the same. Loading data from a local file that is only on one node and not on shared storage may cause it to fail. If required, you can also control which node a statement runs on using the Workload Manager).
For more information, see the following:
Supported Load Methods
You can use the COPY FROM
syntax to load CSV files.
Note
The COPY FROM
cannot be used for loading data from Parquet and ORC files.
You can use foreign tables to load text files, Parquet, and ORC files, and to transform your data before generating a full table, as described in the following table:
Method/File Type |
Text (CSV) |
Parquet |
ORC |
Streaming Data |
---|---|---|---|---|
COPY FROM |
Supported |
Not supported |
Not supported |
Not supported |
Foreign tables |
Supported |
Supported |
Supported |
Not supported |
INSERT |
Not supported |
Not supported |
Not supported |
Supported (Python, JDBC, Node.JS) |
For more information, see the following:
Unsupported Data Types
SQream does not support certain features that are supported by other databases, such as ARRAY
, BLOB
, ENUM
, and SET
. You must convert these data types before loading them. For example, you can store ENUM
as TEXT
.
Handing Extended Errors
While you can use foreign tables to load CSVs, the COPY FROM
statement provides more fine-grained error handling options and extended support for non-standard CSVs with multi-character delimiters, alternate timestamp formats, and more.
For more information, see foreign tables.
Best Practices for CSV
Text files, such as CSV, rarely conform to RFC 4180 , so you may need to make the following modifications:
Use
OFFSET 2
for files containing header rows.You can capture failed rows in a log file for later analysis, or skip them. See Unsupported Field Delimiters for information on skipping rejected rows.
You can modify record delimiters (new lines) using the RECORD DELIMITER syntax.
If the date formats deviate from ISO 8601, refer to the Supported Date Formats section for overriding the default parsing.
(Optional) You can quote fields in a CSV using double-quotes (
"
).
Note
You must quote any field containing a new line or another double-quote character.
If a field is quoted, you must double quote any double quote, similar to the string literals quoting rules. For example, to encode
What are "birds"?
, the field should appear as"What are ""birds""?"
. For more information, see string literals quoting rules.Field delimiters do not have to be a displayable ASCII character. For all supported field delimiters, see Supported Field Delimiters.
Best Practices for Parquet
The following list shows the best practices when ingesting data from Parquet files:
You must load Parquet files through Foreign Tables. Note that the destination table structure must be identical to the number of columns between the source files.
Parquet files support predicate pushdown. When a query is issued over Parquet files, SQream uses row-group metadata to determine which row-groups in a file must be read for a particular query and the row indexes can narrow the search to a particular set of rows.
Supported Types and Behavior Notes
Unlike the ORC format, the column types should match the data types exactly, as shown in the table below:
SQream DB type → Parquet source |
|
|
|
|
|
|
|
Text [1] |
|
|
---|---|---|---|---|---|---|---|---|---|---|
|
Supported |
|||||||||
|
Supported |
|||||||||
|
Supported |
|||||||||
|
Supported |
|||||||||
|
Supported |
|||||||||
|
Supported |
|||||||||
|
Supported |
|||||||||
|
Supported [4] |
If a Parquet file has an unsupported type, such as enum
, uuid
, time
, json
, bson
, lists
, maps
, but the table does not reference this data (i.e., the data does not appear in the SELECT query), the statement will succeed. If the table does reference a column, an error will be displayed explaining that the type is not supported, but the column may be omitted.
Best Practices for ORC
The following list shows the best practices when ingesting data from ORC files:
You must load ORC files through Foreign Tables. Note that the destination table structure must be identical to the number of columns between the source files.
ORC files support predicate pushdown. When a query is issued over ORC files, SQream uses ORC metadata to determine which stripes in a file need to be read for a particular query and the row indexes can narrow the search to a particular set of 10,000 rows.
Type Support and Behavior Notes
You must load ORC files through foreign table. Note that the destination table structure must be identical to the number of columns between the source files.
For more information, see Foreign Tables.
The types should match to some extent within the same “class”, as shown in the following table:
SQream DB Type → ORC Source |
|
|
|
|
|
|
|
Text [1] |
|
|
---|---|---|---|---|---|---|---|---|---|---|
|
Supported |
Supported [5] |
Supported [5] |
Supported [5] |
Supported [5] |
|||||
|
○ [6] |
Supported |
Supported |
Supported |
Supported |
|||||
|
○ [6] |
○ [7] |
Supported |
Supported |
Supported |
|||||
|
○ [6] |
○ [7] |
○ [7] |
Supported |
Supported |
|||||
|
○ [6] |
○ [7] |
○ [7] |
○ [7] |
Supported |
|||||
|
Supported |
Supported |
||||||||
|
Supported |
Supported |
||||||||
|
Supported |
|||||||||
|
Supported |
Supported |
||||||||
|
Supported |
If an ORC file has an unsupported type like
binary
,list
,map
, andunion
, but the data is not referenced in the table (it does not appear in the SELECT query), the statement will succeed. If the column is referenced, an error will be thrown to the user, explaining that the type is not supported, but the column may be ommited.
Further Reading and Migration Guides
For more information, see the following:
Footnotes
Ingesting Data from Avro
The Ingesting Data from Avro page describes ingesting data from Avro into SQream and includes the following:
Overview
Avro is a well-known data serialization system that relies on schemas. Due to its flexibility as an efficient data storage method, SQream supports the Avro binary data format as an alternative to JSON. Avro files are represented using the Object Container File format, in which the Avro schema is encoded alongside binary data. Multiple files loaded in the same transaction are serialized using the same schema. If they are not serialized using the same schema, an error message is displayed. SQream uses the .avro extension for ingested Avro files.
Making Avro Files Accessible to Workers
To give workers access to files every node must have the same view of the storage being used.
The following apply for Avro files to be accessible to workers:
For files hosted on NFS, ensure that the mount is accessible from all servers.
For HDFS, ensure that SQream servers have access to the HDFS name node with the correct user-id. For more information, see HDFS Environment.
For S3, ensure network access to the S3 endpoint. For more information, see Amazon Web Services.
For more information about restricted worker access, see Workload Manager.
Preparing Your Table
You can build your table structure on both local and foreign tables:
Creating a Table
Before loading data, you must build the CREATE TABLE
to correspond with the file structure of the inserted table.
The example in this section is based on the source nba.avro
table shown below:
Name |
Team |
Number |
Position |
Age |
Height |
Weight |
College |
Salary |
---|---|---|---|---|---|---|---|---|
Avery Bradley |
Boston Celtics |
0 |
PG |
25 |
44714 |
180 |
Texas |
7730337 |
Jae Crowder |
Boston Celtics |
99 |
SF |
25 |
44718 |
235 |
Marquette |
6796117 |
John Holland |
Boston Celtics |
30 |
SG |
27 |
44717 |
205 |
Boston University |
|
R.J. Hunter |
Boston Celtics |
28 |
SG |
22 |
44717 |
185 |
Georgia State |
1148640 |
Jonas Jerebko |
Boston Celtics |
8 |
PF |
29 |
44722 |
231 |
5000000 |
|
Amir Johnson |
Boston Celtics |
90 |
PF |
29 |
44721 |
240 |
12000000 |
|
Jordan Mickey |
Boston Celtics |
55 |
PF |
21 |
44720 |
235 |
LSU |
1170960 |
Kelly Olynyk |
Boston Celtics |
41 |
C |
25 |
36708 |
238 |
Gonzaga |
2165160 |
Terry Rozier |
Boston Celtics |
12 |
PG |
22 |
44714 |
190 |
Louisville |
1824360 |
The following example shows the correct file structure used to create the CREATE TABLE
statement based on the nba.avro table:
CREATE TABLE ext_nba
(
Name TEXT(40),
Team TEXT(40),
Number BIGINT,
Position TEXT(2),
Age BIGINT,
Height TEXT(4),
Weight BIGINT,
College TEXT(40),
Salary FLOAT
)
WRAPPER avro_fdw
OPTIONS
(
LOCATION = 's3://sqream-demo-data/nba.avro'
);
Tip
An exact match must exist between the SQream and Avro types. For unsupported column types, you can set the type to any type and exclude it from subsequent queries.
Note
The nba.avro file is stored on S3 at s3://sqream-demo-data/nba.avro
.
Creating a Foreign Table
Before loading data, you must build the CREATE FOREIGN TABLE
to correspond with the file structure of the inserted table.
The example in this section is based on the source nba.avro
table shown below:
Name |
Team |
Number |
Position |
Age |
Height |
Weight |
College |
Salary |
---|---|---|---|---|---|---|---|---|
Avery Bradley |
Boston Celtics |
0 |
PG |
25 |
44714 |
180 |
Texas |
7730337 |
Jae Crowder |
Boston Celtics |
99 |
SF |
25 |
44718 |
235 |
Marquette |
6796117 |
John Holland |
Boston Celtics |
30 |
SG |
27 |
44717 |
205 |
Boston University |
|
R.J. Hunter |
Boston Celtics |
28 |
SG |
22 |
44717 |
185 |
Georgia State |
1148640 |
Jonas Jerebko |
Boston Celtics |
8 |
PF |
29 |
44722 |
231 |
5000000 |
|
Amir Johnson |
Boston Celtics |
90 |
PF |
29 |
44721 |
240 |
12000000 |
|
Jordan Mickey |
Boston Celtics |
55 |
PF |
21 |
44720 |
235 |
LSU |
1170960 |
Kelly Olynyk |
Boston Celtics |
41 |
C |
25 |
36708 |
238 |
Gonzaga |
2165160 |
Terry Rozier |
Boston Celtics |
12 |
PG |
22 |
44714 |
190 |
Louisville |
1824360 |
The following example shows the correct file structure used to create the CREATE FOREIGN TABLE
statement based on the nba.avro table:
CREATE FOREIGN TABLE ext_nba
(
Name TEXT(40),
Team TEXT(40),
Number BIGINT,
Position TEXT(2),
Age BIGINT,
Height TEXT(4),
Weight BIGINT,
College TEXT(40),
Salary FLOAT
)
WRAPPER avro_fdw
OPTIONS
(
LOCATION = 's3://sqream-demo-data/nba.avro'
);
Tip
An exact match must exist between the SQream and Avro types. For unsupported column types, you can set the type to any type and exclude it from subsequent queries.
Note
The nba.avro file is stored on S3 at s3://sqream-demo-data/nba.avro
.
Note
The examples in the sections above are identical except for the syntax used to create the tables.
Mapping Between SQream and Avro Data Types
Mapping between SQream and Avro data types depends on the Avro data type:
Primitive Data Types
The following table shows the supported Primitive data types:
Avro Type |
SQream Type |
|||
---|---|---|---|---|
Number |
Date/Datetime |
String |
Boolean |
|
|
Supported |
Supported |
Supported |
Supported |
|
Supported |
Supported |
||
|
Supported |
Supported |
||
|
Supported |
Supported |
||
|
Supported |
Supported |
||
|
Supported |
Supported |
||
|
||||
|
Supported |
Supported |
Complex Data Types
The following table shows the supported Complex data types:
Avro Type |
SQream Type |
|||
---|---|---|---|---|
Number |
Date/Datetime |
String |
Boolean |
|
|
||||
|
Supported |
|||
|
||||
|
||||
|
Supported |
Supported |
Supported |
Supported |
|
Logical Data Types
The following table shows the supported Logical data types:
Avro Type |
SQream Type |
|||
---|---|---|---|---|
Number |
Date/Datetime |
String |
Boolean |
|
|
Supported |
Supported |
||
|
Supported |
|||
|
Supported |
Supported |
||
|
||||
|
||||
|
Supported |
Supported |
||
|
Supported |
Supported |
||
|
||||
|
||||
|
Note
Number types include tinyint, smallint, int, bigint, real and float, and numeric. String types include text.
Mapping Objects to Rows
When mapping objects to rows, each Avro object or message must contain one record
type object corresponding to a single row in SQream. The record
fields are associated by name to their target table columns. Additional unmapped fields will be ignored. Note that using the JSONPath option overrides this.
Ingesting Data into SQream
This section includes the following:
Syntax
Before ingesting data into SQream from an Avro file, you must create a table using the following syntax:
COPY [schema name.]table_name
FROM WRAPPER fdw_name
;
After creating a table you can ingest data from an Avro file into SQream using the following syntax:
avro_fdw
Example
The following is an example of creating a table:
COPY t
FROM WRAPPER fdw_name
OPTIONS
(
[ copy_from_option [, ...] ]
)
;
The following is an example of loading data from an Avro file into SQream:
WRAPPER avro_fdw
OPTIONS
(
LOCATION = 's3://sqream-demo-data/nba.avro'
);
For more examples, see Additional Examples.
Parameters
The following table shows the Avro parameter:
Parameter |
Description |
---|---|
|
The schema name for the table. Defaults to |
Best Practices
Because external tables do not automatically verify the file integrity or structure, SQream recommends manually verifying your table output when ingesting Avro files into SQream. This lets you determine if your table output is identical to your originally inserted table.
The following is an example of the output based on the nba.avro table:
t=> SELECT * FROM ext_nba LIMIT 10;
Name | Team | Number | Position | Age | Height | Weight | College | Salary
--------------+----------------+--------+----------+-----+--------+--------+-------------------+---------
Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas | 7730337
Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette | 6796117
John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston University |
R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia State | 1148640
Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | 231 | | 5000000
Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | 240 | | 12000000
Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | 235 | LSU | 1170960
Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | 238 | Gonzaga | 2165160
Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | 190 | Louisville | 1824360
Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | 220 | Oklahoma State | 3431040
Note
If your table output has errors, verify that the structure of the Avro files correctly corresponds to the external table structure that you created.
Additional Examples
This section includes the following additional examples of loading data into SQream:
Omitting Unsupported Column Types
When loading data, you can omit columns using the NULL as
argument. You can use this argument to omit unsupported columns from queries that access external tables. By omitting them, these columns will not be called and will avoid generating a “type mismatch” error.
In the example below, the Position
column is not supported due its type.
CREATE TABLE nba AS
SELECT Name, Team, Number, NULL as Position, Age, Height, Weight, College, Salary FROM ext_nba;
Modifying Data Before Loading
One of the main reasons for staging data using the EXTERNAL TABLE
argument is to examine and modify table contents before loading it into SQream.
For example, we can replace pounds with kilograms using the CREATE TABLE AS statement
In the example below, the Position
column is set to the default NULL
.
CREATE TABLE nba AS
SELECT name, team, number, NULL as Position, age, height, (weight / 2.205) as weight, college, salary
FROM ext_nba
ORDER BY weight;
Loading a Table from a Directory of Avro Files on HDFS
The following is an example of loading a table from a directory of Avro files on HDFS:
CREATE FOREIGN TABLE ext_users
(id INT NOT NULL, name TEXT(30) NOT NULL, email TEXT(50) NOT NULL)
WRAPPER avro_fdw
OPTIONS
(
LOCATION = 'hdfs://hadoop-nn.piedpiper.com/rhendricks/users/*.avro'
);
CREATE TABLE users AS SELECT * FROM ext_users;
For more configuration option examples, navigate to the CREATE FOREIGN TABLE page and see the Parameters table.
Loading a Table from a Directory of Avro Files on S3
The following is an example of loading a table from a directory of Avro files on S3:
CREATE FOREIGN TABLE ext_users
(id INT NOT NULL, name TEXT(30) NOT NULL, email TEXT(50) NOT NULL)
WRAPPER avro_fdw
OPTIONS
( LOCATION = 's3://pp-secret-bucket/users/*.avro',
AWS_ID = 'our_aws_id',
AWS_SECRET = 'our_aws_secret'
);
CREATE TABLE users AS SELECT * FROM ext_users;
Ingesting Data from a CSV File
This guide covers ingesting data from CSV files into SQream DB using the COPY FROM method.
Prepare CSVs
Prepare the source CSVs, with the following requirements:
Files should be a valid CSV. By default, SQream DB’s CSV parser can handle RFC 4180 standard CSVs , but can also be modified to support non-standard CSVs (with multi-character delimiters, unquoted fields, etc).
Files are UTF-8 or ASCII encoded
Field delimiter is an ASCII character or characters
Record delimiter, also known as a new line separator, is a Unix-style newline (
\n
), DOS-style newline (\r\n
), or Mac style newline (\r
).Fields are optionally enclosed by double-quotes, or mandatory quoted if they contain one of the following characters:
The record delimiter or field delimiter
A double quote character
A newline
If a field is quoted, any double quote that appears must be double-quoted (similar to the string literals quoting rules. For example, to encode
What are "birds"?
, the field should appear as"What are ""birds""?"
.Other modes of escaping are not supported (e.g.
1,"What are \"birds\"?"
is not a valid way of escaping CSV values).NULL
values can be marked in two ways in the CSV:An explicit null marker. For example,
col1,\N,col3
An empty field delimited by the field delimiter. For example,
col1,,col3
Note
If a text field is quoted but contains no content (
""
) it is considered an empty text field. It is not consideredNULL
.
Place CSVs where SQream DB workers can access
During data load, the COPY FROM command can run on any worker (unless explicitly speficied with the Workload Manager). It is important that every node has the same view of the storage being used - meaning, every SQream DB worker should have access to the files.
For files hosted on NFS, ensure that the mount is accessible from all servers.
For HDFS, ensure that SQream DB servers can access the HDFS name node with the correct user-id. See our HDFS Environment guide for more information.
For S3, ensure network access to the S3 endpoint. See our Amazon Web Services guide for more information.
Figure out the table structure
Prior to loading data, you will need to write out the table structure, so that it matches the file structure.
For example, to import the data from nba.csv
, we will first look at the file:
Name |
Team |
Number |
Position |
Age |
Height |
Weight |
College |
Salary |
---|---|---|---|---|---|---|---|---|
Avery Bradley |
Boston Celtics |
0 |
PG |
25 |
44714 |
180 |
Texas |
7730337 |
Jae Crowder |
Boston Celtics |
99 |
SF |
25 |
44718 |
235 |
Marquette |
6796117 |
John Holland |
Boston Celtics |
30 |
SG |
27 |
44717 |
205 |
Boston University |
|
R.J. Hunter |
Boston Celtics |
28 |
SG |
22 |
44717 |
185 |
Georgia State |
1148640 |
Jonas Jerebko |
Boston Celtics |
8 |
PF |
29 |
44722 |
231 |
5000000 |
|
Amir Johnson |
Boston Celtics |
90 |
PF |
29 |
44721 |
240 |
12000000 |
|
Jordan Mickey |
Boston Celtics |
55 |
PF |
21 |
44720 |
235 |
LSU |
1170960 |
Kelly Olynyk |
Boston Celtics |
41 |
C |
25 |
36708 |
238 |
Gonzaga |
2165160 |
Terry Rozier |
Boston Celtics |
12 |
PG |
22 |
44714 |
190 |
Louisville |
1824360 |
The file format in this case is CSV, and it is stored as an S3 object.
The first row of the file is a header containing column names.
The record delimiter was a DOS newline (
\r\n
).The file is stored on S3, at
s3://sqream-demo-data/nba.csv
.
We will make note of the file structure to create a matching CREATE TABLE
statement.
CREATE TABLE nba
(
Name text(40),
Team text(40),
Number tinyint,
Position text(2),
Age tinyint,
Height text(4),
Weight real,
College text(40),
Salary float
);
Bulk load the data with COPY FROM
The CSV is a standard CSV, but with two differences from SQream DB defaults:
The record delimiter is not a Unix newline (
\n
), but a Windows newline (\r\n
)The first row of the file is a header containing column names, which we’ll want to skip.
COPY nba
FROM 's3://sqream-demo-data/nba.csv'
WITH RECORD DELIMITER '\r\n'
OFFSET 2;
Repeat steps 3 and 4 for every CSV file you want to import.
Loading different types of CSV files
COPY FROM contains several configuration options. See more in the COPY FROM elements section.
Loading a standard CSV file from a local filesystem
COPY table_name FROM '/home/rhendricks/file.csv';
Loading a PSV (pipe separated value) file
COPY table_name FROM '/home/rhendricks/file.psv' WITH DELIMITER '|';
Loading a TSV (tab separated value) file
COPY table_name FROM '/home/rhendricks/file.tsv' WITH DELIMITER '\t';
Loading a text file with non-printable delimiter
In the file below, the separator is DC1
, which is represented by ASCII 17 decimal or 021 octal.
COPY table_name FROM 'file.txt' WITH DELIMITER E'\021';
Loading a text file with multi-character delimiters
In the file below, the separator is '|
.
COPY table_name FROM 'file.txt' WITH DELIMITER '''|';
Loading files with a header row
Use OFFSET
to skip rows.
Note
When loading multiple files (e.g. with wildcards), this setting affects each file separately.
COPY table_name FROM 'filename.psv' WITH DELIMITER '|' OFFSET 2;
Loading files formatted for Windows (\r\n
)
COPY table_name FROM 'filename.psv' WITH DELIMITER '|' RECORD DELIMITER '\r\n';
Loading a file from a public S3 bucket
Note
The bucket must be publicly available and objects can be listed
COPY nba FROM 's3://sqream-demo-data/nba.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n';
Loading files from an authenticated S3 bucket
COPY nba FROM 's3://secret-bucket/*.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n' AWS_ID '12345678' AWS_SECRET 'super_secretive_secret';
Loading files from an HDFS storage
COPY nba FROM 'hdfs://hadoop-nn.piedpiper.com/rhendricks/*.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n';
Saving rejected rows to a file
See Unsupported Field Delimiters for more information about the error handling capabilities of COPY FROM
.
COPY table_name FROM WRAPPER csv_fdw OPTIONS (location = '/tmp/file.psv'
,delimiter = '|'
,continue_on_error = True
,error_log = '/temp/load_error.log' -- Save error log
,rejected_data = '/temp/load_rejected.log' -- Only save rejected rows
);
Stopping the load if a certain amount of rows were rejected
COPY table_name FROM 'filename.csv' WITH delimiter '|'
ERROR_LOG '/temp/load_err.log' -- Save error log
OFFSET 2 -- skip header row
LIMIT 100 -- Only load 100 rows
STOP AFTER 5 ERRORS; -- Stop the load if 5 errors reached
Load CSV files from a set of directories
Use glob patterns (wildcards) to load multiple files to one table.
COPY table_name from '/path/to/files/2019_08_*/*.csv';
Rearrange destination columns
When the source of the files does not match the table structure, tell the COPY
command what the order of columns should be
COPY table_name (fifth, first, third) FROM '/path/to/files/*.csv';
Note
Any column not specified will revert to its default value or NULL
value if nullable
Loading non-standard dates
If files contain dates not formatted as ISO8601
, tell COPY
how to parse the column. After parsing, the date will appear as ISO8601
inside SQream DB.
In this example, date_col1
and date_col2
in the table are non-standard. date_col3
is mentioned explicitly, but can be left out. Any column that is not specified is assumed to be ISO8601
.
COPY table_name FROM '/path/to/files/*.csv' WITH PARSERS 'date_col1=YMD,date_col2=MDY,date_col3=default';
Tip
The full list of supported date formats can be found under the Supported date formats section of the COPY FROM reference.
Ingesting Data from a Parquet File
This guide covers ingesting data from Parquet files into SQream using FOREIGN TABLE, and describes the following;
Overview
SQream supports ingesting data into SQream from Parquet files. However, because it is an open-source column-oriented data storage format, you may want to retain your data on external Parquet files instead of ingesting it into SQream. SQream supports executing queries on external Parquet files.
Preparing Your Parquet Files
Prepare your source Parquet files according to the requirements described in the following table:
SQream Type → Parquet Source ↓ |
|
|
|
|
|
|
|
|
|
|
---|---|---|---|---|---|---|---|---|---|---|
|
Supported |
|||||||||
|
Supported |
|||||||||
|
Supported |
|||||||||
|
Supported |
|||||||||
|
Supported |
|||||||||
|
Supported |
|||||||||
|
Supported |
|||||||||
|
Supported [4] |
Your statements will succeed even if your Parquet file contains an unsupported type, such as
enum
,uuid
,time
,json
,bson
,lists
,maps
, but the data is not referenced in the table (it does not appear in the SELECT query). If the column containing the unsupported type is referenced, an error message is displayed explaining that the type is not supported and that the column may be ommitted. For solutions to this error message, see more information in Managing Unsupported Column Types example in the Example section.
Footnotes
Text values include TEXT
With UTF8 annotation
With TIMESTAMP_NANOS
or TIMESTAMP_MILLIS
annotation
Any microseconds will be rounded down to milliseconds.
Making Parquet Files Accessible to Workers
To give workers access to files every node must have the same view of the storage being used.
For files hosted on NFS, ensure that the mount is accessible from all servers.
For HDFS, ensure that SQream servers have access to the HDFS name node with the correct user-id. For more information, see HDFS Environment guide for more information.
For S3, ensure network access to the S3 endpoint. For more information, see Amazon Web Services guide for more information.
Creating a Table
Before loading data, you must build the CREATE TABLE to correspond with the file structure of the inserted table.
The example in this section is based on the source nba.parquet table shown below:
Name |
Team |
Number |
Position |
Age |
Height |
Weight |
College |
Salary |
---|---|---|---|---|---|---|---|---|
Avery Bradley |
Boston Celtics |
0 |
PG |
25 |
44714 |
180 |
Texas |
7730337 |
Jae Crowder |
Boston Celtics |
99 |
SF |
25 |
44718 |
235 |
Marquette |
6796117 |
John Holland |
Boston Celtics |
30 |
SG |
27 |
44717 |
205 |
Boston University |
|
R.J. Hunter |
Boston Celtics |
28 |
SG |
22 |
44717 |
185 |
Georgia State |
1148640 |
Jonas Jerebko |
Boston Celtics |
8 |
PF |
29 |
44722 |
231 |
5000000 |
|
Amir Johnson |
Boston Celtics |
90 |
PF |
29 |
44721 |
240 |
12000000 |
|
Jordan Mickey |
Boston Celtics |
55 |
PF |
21 |
44720 |
235 |
LSU |
1170960 |
Kelly Olynyk |
Boston Celtics |
41 |
C |
25 |
36708 |
238 |
Gonzaga |
2165160 |
Terry Rozier |
Boston Celtics |
12 |
PG |
22 |
44714 |
190 |
Louisville |
1824360 |
The following example shows the correct file structure used to create the CREATE EXTERNAL TABLE
statement based on the nba.parquet table:
CREATE FOREIGN TABLE ext_nba
(
Name TEXT(40),
Team TEXT(40),
Number BIGINT,
Position TEXT(2),
Age BIGINT,
Height TEXT(4),
Weight BIGINT,
College TEXT(40),
Salary FLOAT
)
WRAPPER parquet_fdw
OPTIONS
(
LOCATION = 's3://sqream-demo-data/nba.parquet'
);
Tip
An exact match must exist between the SQream and Parquet types. For unsupported column types, you can set the type to any type and exclude it from subsequent queries.
Note
The nba.parquet file is stored on S3 at s3://sqream-demo-data/nba.parquet
.
Ingesting Data into SQream
This section describes the following:
Syntax
You can use the CREATE TABLE AS statement to load the data into SQream, as shown below:
CREATE TABLE nba AS
SELECT * FROM ext_nba;
Examples
This section describes the following examples:
Omitting Unsupported Column Types
When loading data, you can omit columns using the NULL as argument. You can use this argument to omit unsupported columns from queries that access external tables. By omitting them, these columns will not be called and will avoid generating a “type mismatch” error.
In the example below, the Position column
is not supported due its type.
CREATE TABLE nba AS
SELECT Name, Team, Number, NULL as Position, Age, Height, Weight, College, Salary FROM ext_nba;
Modifying Data Before Loading
One of the main reasons for staging data using the EXTERNAL TABLE
argument is to examine and modify table contents before loading it into SQream.
For example, we can replace pounds with kilograms using the CREATE TABLE AS
statement.
In the example below, the Position column
is set to the default NULL
.
CREATE TABLE nba AS
SELECT name, team, number, NULL as position, age, height, (weight / 2.205) as weight, college, salary
FROM ext_nba
ORDER BY weight;
Loading a Table from a Directory of Parquet Files on HDFS
The following is an example of loading a table from a directory of Parquet files on HDFS:
CREATE FOREIGN TABLE ext_users
(id INT NOT NULL, name TEXT(30) NOT NULL, email TEXT(50) NOT NULL)
WRAPPER parquet_fdw
OPTIONS
(
LOCATION = 'hdfs://hadoop-nn.piedpiper.com/rhendricks/users/*.parquet'
);
CREATE TABLE users AS SELECT * FROM ext_users;
Loading a Table from a Directory of Parquet Files on S3
The following is an example of loading a table from a directory of Parquet files on S3:
CREATE FOREIGN TABLE ext_users
(id INT NOT NULL, name TEXT(30) NOT NULL, email TEXT(50) NOT NULL)
WRAPPER parquet_fdw
OPTIONS
( LOCATION = 's3://pp-secret-bucket/users/*.parquet',
AWS_ID = 'our_aws_id',
AWS_SECRET = 'our_aws_secret'
);
CREATE TABLE users AS SELECT * FROM ext_users;
For more configuration option examples, navigate to the CREATE FOREIGN TABLE page and see the Parameters table.
Best Practices
Because external tables do not automatically verify the file integrity or structure, SQream recommends manually verifying your table output when ingesting Parquet files into SQream. This lets you determine if your table output is identical to your originally inserted table.
The following is an example of the output based on the nba.parquet table:
t=> SELECT * FROM ext_nba LIMIT 10;
Name | Team | Number | Position | Age | Height | Weight | College | Salary
--------------+----------------+--------+----------+-----+--------+--------+-------------------+---------
Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas | 7730337
Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette | 6796117
John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston University |
R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia State | 1148640
Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | 231 | | 5000000
Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | 240 | | 12000000
Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | 235 | LSU | 1170960
Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | 238 | Gonzaga | 2165160
Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | 190 | Louisville | 1824360
Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | 220 | Oklahoma State | 3431040
Note
If your table output has errors, verify that the structure of the Parquet files correctly corresponds to the external table structure that you created.
Ingesting Data from an ORC File
This guide covers ingesting data from ORC files into SQream DB using FOREIGN TABLE.
Prepare the files
Prepare the source ORC files, with the following requirements:
SQream DB type → ORC source |
|
|
|
|
|
|
|
|
|
|
---|---|---|---|---|---|---|---|---|---|---|
|
Supported |
Supported [2] |
Supported [2] |
Supported [2] |
Supported [2] |
|||||
|
○ [3] |
Supported |
Supported |
Supported |
Supported |
|||||
|
○ [3] |
○ [4] |
Supported |
Supported |
Supported |
|||||
|
○ [3] |
○ [4] |
○ [4] |
Supported |
Supported |
|||||
|
○ [3] |
○ [4] |
○ [4] |
○ [4] |
Supported |
|||||
|
Supported |
Supported |
||||||||
|
Supported |
Supported |
||||||||
|
Supported |
|||||||||
|
Supported |
Supported |
||||||||
|
Supported |
If an ORC file has an unsupported type like
binary
,list
,map
, andunion
, but the data is not referenced in the table (it does not appear in the SELECT query), the statement will succeed. If the column is referenced, an error will be thrown to the user, explaining that the type is not supported, but the column may be ommited. This can be worked around. See more information in the examples.
Footnotes
Text values include TEXT
Boolean values are cast to 0, 1
Will succeed if all values are 0, 1
Will succeed if all values fit the destination type
Place ORC files where SQream DB workers can access them
Any worker may try to access files (unless explicitly speficied with the Workload Manager). It is important that every node has the same view of the storage being used - meaning, every SQream DB worker should have access to the files.
For files hosted on NFS, ensure that the mount is accessible from all servers.
For HDFS, ensure that SQream DB servers can access the HDFS name node with the correct user-id. See our HDFS Environment guide for more information.
For S3, ensure network access to the S3 endpoint. See our Amazon Web Services guide for more information.
Figure out the table structure
Prior to loading data, you will need to write out the table structure, so that it matches the file structure.
For example, to import the data from nba.orc
, we will first look at the source table:
Name |
Team |
Number |
Position |
Age |
Height |
Weight |
College |
Salary |
---|---|---|---|---|---|---|---|---|
Avery Bradley |
Boston Celtics |
0 |
PG |
25 |
44714 |
180 |
Texas |
7730337 |
Jae Crowder |
Boston Celtics |
99 |
SF |
25 |
44718 |
235 |
Marquette |
6796117 |
John Holland |
Boston Celtics |
30 |
SG |
27 |
44717 |
205 |
Boston University |
|
R.J. Hunter |
Boston Celtics |
28 |
SG |
22 |
44717 |
185 |
Georgia State |
1148640 |
Jonas Jerebko |
Boston Celtics |
8 |
PF |
29 |
44722 |
231 |
5000000 |
|
Amir Johnson |
Boston Celtics |
90 |
PF |
29 |
44721 |
240 |
12000000 |
|
Jordan Mickey |
Boston Celtics |
55 |
PF |
21 |
44720 |
235 |
LSU |
1170960 |
Kelly Olynyk |
Boston Celtics |
41 |
C |
25 |
36708 |
238 |
Gonzaga |
2165160 |
Terry Rozier |
Boston Celtics |
12 |
PG |
22 |
44714 |
190 |
Louisville |
1824360 |
The file is stored on S3, at
s3://sqream-demo-data/nba.orc
.
We will make note of the file structure to create a matching CREATE FOREIGN TABLE
statement.
CREATE FOREIGN TABLE ext_nba
(
Name TEXT(40),
Team TEXT(40),
Number BIGINT,
Position TEXT(2),
Age BIGINT,
Height TEXT(4),
Weight BIGINT,
College TEXT(40),
Salary FLOAT
)
WRAPPER orc_fdw
OPTIONS
(
LOCATION = 's3://sqream-demo-data/nba.orc'
);
Tip
Types in SQream DB must match ORC types according to the table above.
If the column type isn’t supported, a possible workaround is to set it to any arbitrary type and then exclude it from subsequent queries.
Verify table contents
External tables do not verify file integrity or structure, so verify that the table definition matches up and contains the correct data.
t=> SELECT * FROM ext_nba LIMIT 10;
Name | Team | Number | Position | Age | Height | Weight | College | Salary
--------------+----------------+--------+----------+-----+--------+--------+-------------------+---------
Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas | 7730337
Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette | 6796117
John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston University |
R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia State | 1148640
Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | 231 | | 5000000
Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | 240 | | 12000000
Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | 235 | LSU | 1170960
Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | 238 | Gonzaga | 2165160
Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | 190 | Louisville | 1824360
Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | 220 | Oklahoma State | 3431040
If any errors show up at this stage, verify the structure of the ORC files and match them to the external table structure you created.
Copying data into SQream DB
To load the data into SQream DB, use the CREATE TABLE AS statement:
CREATE TABLE nba AS
SELECT * FROM ext_nba;
Working around unsupported column types
Suppose you only want to load some of the columns - for example, if one of the columns isn’t supported.
By ommitting unsupported columns from queries that access the EXTERNAL TABLE
, they will never be called, and will not cause a “type mismatch” error.
For this example, assume that the Position
column isn’t supported because of its type.
CREATE TABLE nba AS
SELECT Name, Team, Number, NULL as Position, Age, Height, Weight, College, Salary FROM ext_nba;
-- We ommitted the unsupported column `Position` from this query, and replaced it with a default ``NULL`` value, to maintain the same table structure.
Modifying data during the copy process
One of the main reasons for staging data with EXTERNAL TABLE
is to examine the contents and modify them before loading them.
Assume we are unhappy with weight being in pounds, because we want to use kilograms instead. We can apply the transformation as part of the CREATE TABLE AS statement.
Similar to the previous example, we will also set the Position
column as a default NULL
.
CREATE TABLE nba AS
SELECT name, team, number, NULL as position, age, height, (weight / 2.205) as weight, college, salary
FROM ext_nba
ORDER BY weight;
Further ORC loading examples
CREATE FOREIGN TABLE contains several configuration options. See more in the CREATE FOREIGN TABLE parameters section.
Loading a table from a directory of ORC files on HDFS
CREATE FOREIGN TABLE ext_users
(id INT NOT NULL, name TEXT(30) NOT NULL, email TEXT(50) NOT NULL)
WRAPPER orc_fdw
OPTIONS
(
LOCATION = 'hdfs://hadoop-nn.piedpiper.com/rhendricks/users/*.ORC'
);
CREATE TABLE users AS SELECT * FROM ext_users;
Loading a table from a bucket of files on S3
CREATE FOREIGN TABLE ext_users
(id INT NOT NULL, name TEXT(30) NOT NULL, email TEXT(50) NOT NULL)
WRAPPER orc_fdw
OPTIONS
( LOCATION = 's3://pp-secret-bucket/users/*.ORC',
AWS_ID = 'our_aws_id',
AWS_SECRET = 'our_aws_secret'
)
;
CREATE TABLE users AS SELECT * FROM ext_users;
Ingesting Data from JSON
Overview
JSON (Java Script Object Notation) is used both as a file format and as a serialization method. The JSON file format is flexible and is commonly used for dynamic, nested, and semi-structured data representations.
The SQream DB JSON parser supports the RFC 8259 data interchange format and supports both JSON objects and JSON object arrays.
Only the JSON Lines data format is supported by SQream.
Making JSON Files Accessible to Workers
To give workers access to files, every node in your system must have access to the storage being used.
The following are required for JSON files to be accessible to workers:
For files hosted on NFS, ensure that the mount is accessible from all servers.
For HDFS, ensure that SQream servers have access to the HDFS NameNode with the correct user-id. For more information, see HDFS Environment.
For S3, ensure network access to the S3 endpoint. For more information, see Amazon Web Services.
For more information about configuring worker access, see Workload Manager.
Mapping between JSON and SQream
A JSON field consists of a key name and a value.
Key names, which are case sensitive, are mapped to SQream columns. Key names which do not have corresponding SQream table columns are treated as errors by default, unless the IGNORE_EXTRA_FIELDS
parameter is set to true
, in which case these key names will be ignored during the mapping process.
SQream table columns which do not have corresponding JSON fields are automatically set to null
as a value.
Values may be one of the following reserved words (lower-case): false
, true
, or null
, or any of the following data types:
JSON Data Type |
Representation in SQream |
---|---|
Number |
|
String |
|
JSON Literal |
|
JSON Array |
|
JSON Object |
|
Character Escaping
The ASCII 10 character (LF) marks the end of JSON objects. Use \\n
to escape the \n
character when you do not mean it be a new line.
Ingesting JSON Data into SQream
Syntax
To access JSON files, use the json_fdw
with a COPY FROM
, COPY TO
, or CREATE FOREIGN TABLE
statement.
The Foreign Data Wrapper (FDW) syntax is:
json_fdw [OPTIONS(option=value[,...])]
Parameters
The following parameters are supported by json_fdw
:
Parameter |
Description |
---|---|
|
Default format is |
|
Default value is |
|
Supported values are |
|
A path on the local filesystem, on S3, or on HDFS URI. The local path must be an absolute path that SQream DB can access. |
|
When specified, tells SQream DB to stop ingesting after the specified number of rows. Unlimited if unset. |
|
The row number from which to start ingesting. |
|
If when using the
|
|
Specifies if errors should be ignored or skipped. When set to true, the transaction will continue despite rejected data. This parameter should be set together with |
|
Specifies the maximum number of faulty records that will be ignored. This setting must be used in conjunction with |
|
Sets the maximum file size (bytes). |
|
Permitted values are |
|
Specifies the authentication details for secured S3 buckets. |
Automatic Schema Inference
You may let SQream DB automatically infer the schema of a foreign table when using json_fdw
.
For more information, follow the Automatic Foreign Table DDL Resolution page.
Automatic Schema Inference example:
CREATE FOREIGN TABLE t
WRAPPER json_fdw
OPTIONS
(
location = 'somefile.json'
)
;
Examples
JSON object array:
{ "name":"Avery Bradley", "age":25, "position":"PG" }
{ "name":"Jae Crowder", "age":25, "position":"PG" }
{ "name":"John Holland", "age":27, "position":"SG" }
JSON objects:
[
{ "name":"Avery Bradley", "age":25, "position":"PG" },
{ "name":"Jae Crowder", "age":25, "position":"SF" },
{ "name":"John Holland", "age":27, "position":"SG" }
]
Using the COPY FROM
statement:
COPY t
FROM WRAPPER json_fdw
OPTIONS
(
location = 'somefile.json'
)
;
Note that JSON files generated using the COPY TO
statement will store objects, and not object arrays.
COPY t
TO WRAPPER json_fdw
OPTIONS
(
location = 'somefile.json'
)
;
When using the CREATE FOREIGN TABLE
statement, make sure that the table schema corresponds with the JSON file structure.
CREATE FOREIGN TABLE t
(
id int not null
)
WRAPPER json_fdw
OPTIONS
(
location = 'somefile.json'
)
;
The following is an example of loading data from a JSON file into SQream:
WRAPPER json_fdw
OPTIONS
(
LOCATION = 'somefile.json'
);
Tip
An exact match must exist between the SQream and JSON types. For unsupported column types, you can set the type to any type and exclude it from subsequent queries.
For information about database tools and interfaces that SQream supports, see Third Party Tools.
Connecting to SQream
SQream supports the most common database tools and interfaces, giving you direct access through a variety of drivers, connectors, and visualization tools and utilities. The tools described on this page have been tested and approved for use with SQream.
Client Platforms
These topics explain how to install and connect a variety of third party tools.
Browse the articles below, in the sidebar, or use the search to find the information you need.
Overview
SQream DB is designed to work with most common database tools and interfaces, allowing you direct access through a variety of drivers, connectors, tools, vizualisers, and utilities.
The tools listed have been tested and approved for use with SQream DB. Most 3rd party tools that work through JDBC, ODBC, and Python should work.
If you are looking for a tool that is not listed, SQream and our partners can help. Go to SQream Support or contact your SQream account manager for more information.
Connect to SQream Using Informatica Cloud Services
Overview
The Connecting to SQream Using Informatica Cloud Services page is quick start guide for connecting to SQream using Informatica cloud services.
It describes the following:
Establishing a Connection between SQream and Informatica
The Establishing a Connection between SQream and Informatica page describes how to establish a connection between SQream and the Informatica data integration Cloud.
To establish a connection between SQream and the Informatica data integration Cloud:
Go to the Informatica Cloud homepage.
Do one of the following:
Log in using your credentials.
Log in using your SAML Identity Provider.
From the Services window, select Administrator or click Show all services to show all services.
The SQream dashboard is displayed.
In the menu on the left, click Runtime Environments.
The Runtime Environments panel is displayed.
Click Download Secure Agent.
When the Download the Secure Agent panel is displayed, do the following:
Select a platform (Windows 64 or Linux 64).
Click Copy and save the token on your local hard drive.
The token is used in combination with your user name to authorize the agent to access your account.
Click Download.
The installation begins.
When the Informatica Cloud Secure Agent Setup panel is displayed, click Next.
Provide your User Name and Install Token and click Register.
From the Runtime Environments panel, click New Runtime Environment.
The New Secure Agent Group window is displayed.
On the New Secure Agent Group window, click OK to connect your Runtime Environment with the running agent.
Note
If you do not download Secure Agent, you will not be able to connect your Runtime Environment with the running agent and continue establishing a connection between SQream and the Informatica data integration Cloud.
Establishing a Connection In Your Environment
The Establishing a Connection In Your Environment describes the following:
Establishing an ODBC DSN Connection In Your Environment
After establishing a connection between SQream and Informatica you can establish an ODBC DSN connection in your environment.
To establish an ODBC connection in your environment:
Click Add.
Click Configure.
Note
Verify that Use Server Picker is selected.
Click Test.
Verify that the connection has tested successfully.
Click Save.
Click Actions > Publish.
Establishing a JDBC Connection In Your Environment
After establishing a connection between SQream and Informatica you can establish a JDBC connection in your environment.
To establish a JDBC connection in your environment:
Create a new DB connection by clicking Connections > New Connection.
The New Connection window is displayed.
In the JDBC_IC Connection Properties section, in the JDBC Connection URL field, establish a JDBC connection by providing the correct connection string.
For connection string examples, see Connection Strings.
Click Test.
Verify that the connection has tested successfully.
Click Save.
Click Actions > Publish.
Supported SQream Driver Versions
SQream supports the following SQream driver versions:
JDBC - Version 4.3.4 and above.
ODBC - Version 4.0.0 and above.
MicroStrategy
Overview
This document is a Quick Start Guide that describes how to install MicroStrategy and connect a datasource to the MicroStrategy dasbhoard for analysis.
The Connecting to SQream Using MicroStrategy page describes the following:
What is MicroStrategy?
MicroStrategy is a Business Intelligence software offering a wide variety of data analytics capabilities. SQream uses the MicroStrategy connector for reading and loading data into SQream.
MicroStrategy provides the following:
Data discovery
Advanced analytics
Data visualization
Embedded BI
Banded reports and statements
For more information about Microstrategy, see MicroStrategy.
Connecting a Data Source
Activate the MicroStrategy Desktop app. The app displays the Dossiers panel to the right.
Download the most current version of the SQream JDBC driver.
Click Dossiers and New Dossier. The Untitled Dossier panel is displayed.
Click New Data.
From the Data Sources panel, select Databases to access data from tables. The Select Import Options panel is displayed.
Select one of the following:
Build a Query
Type a Query
Select Tables
Click Next.
In the Data Source panel, do the following:
From the Database dropdown menu, select Generic. The Host Name, Port Number, and Database Name fields are removed from the panel.
In the Version dropdown menu, verify that Generic DBMS is selected.
Click Show Connection String.
Select the Edit connection string checkbox.
From the Driver dropdown menu, select a driver for one of the following connectors:
JDBC - The SQream driver is not integrated with MicroStrategy and does not appear in the dropdown menu. However, to proceed, you must select an item, and in the next step you must specify the path to the SQream driver that you installed on your machine.
ODBC - SQreamDB ODBC
In the Connection String text box, type the relevant connection string and path to the JDBC jar file using the following syntax:
$ jdbc:Sqream://<host and port>/<database name>;user=<username>;password=<password>sqream;[<optional parameters>; ...]
The following example shows the correct syntax for the JDBC connector:
jdbc;MSTR_JDBC_JAR_FOLDER=C:\path\to\jdbc\folder;DRIVER=<driver>;URL={jdbc:Sqream://<host and port>/<database name>;user=<username>;password=<password>;[<optional parameters>; ...];}
The following example shows the correct syntax for the ODBC connector:
odbc:Driver={SqreamODBCDriver};DSN={SQreamDB ODBC};Server=<Host>;Port=<Port>;Database=<database name>;User=<username>;Password=<password>;Cluster=<boolean>;
For more information about the available connection parameters and other examples, see Connection Parameters.
In the User and Password fields, fill out your user name and password.
In the Data Source Name field, type SQreamDB.
Click Save. The SQreamDB that you picked in the Data Source panel is displayed.
In the Namespace menu, select a namespace. The tables files are displayed.
Drag and drop the tables into the panel on the right in your required order.
Recommended - Click Prepare Data to customize your data for analysis.
Click Finish.
From the Data Access Mode dialog box, select one of the following:
Connect Live
Import as an In-memory Dataset
Your populated dashboard is displayed and is ready for data discovery and analytics.
Supported SQream Drivers
The following list shows the supported SQream drivers and versions:
JDBC - Version 4.3.3 and higher.
ODBC - Version 4.0.0.
Pentaho Data Integration
Overview
This document is a Quick Start Guide that describes how to install Pentaho, create a transformation, and define your output.
The Connecting to SQream Using Pentaho page describes the following:
Installing Pentaho
To install PDI, see the Pentaho Community Edition (CE) Installation Guide.
The Pentaho Community Edition (CE) Installation Guide describes how to do the following:
Downloading the PDI software.
Installing the JRE (Java Runtime Environment) and JDK (Java Development Kit).
Setting up the JRE and JDK environment variables for PDI.
Installing and Setting Up the JDBC Driver
After installing Pentaho you must install and set up the JDBC driver. This section explains how to set up the JDBC driver using Pentaho. These instructions use Spoon, the graphical transformation and job designer associated with the PDI suite.
You can install the driver by copying and pasting the SQream JDBC .jar file into your <directory>/design-tools/data-integration/lib directory.
Creating a Transformation
After installing Pentaho you can create a transformation.
To create a transformation:
Use the CLI to open the PDI client for your operating system (Windows):
$ spoon.bat
Open the spoon.bat file from its folder location.
In the View tab, right-click Transformations and click New.
A new transformation tab is created.
In the Design tab, click Input to show its file contents.
Drag and drop the CSV file input item to the new transformation tab that you created.
Double-click CSV file input. The CSV file input panel is displayed.
In the Step name field, type a name.
To the right of the Filename field, click Browse.
Select the file that you want to read from and click OK.
In the CSV file input window, click Get Fields.
In the Sample data window, enter the number of lines you want to sample and click OK. The default setting is 100.
The tool reads the file and suggests the field name and type.
In the CSV file input window, click Preview.
In the Preview size window, enter the number of rows you want to preview and click OK. The default setting is 1000.
Verify that the preview data is correct and click Close.
Click OK in the CSV file input window.
Defining Your Output
After creating your transformation you must define your output.
To define your output:
In the Design tab, click Output.
The Output folder is opened.
Drag and drop Table output item to the Transformation window.
Double-click Table output to open the Table output dialog box.
From the Table output dialog box, type a Step name and click New to create a new connection. Your steps are the building blocks of a transformation, such as file input or a table output.
The Database Connection window is displayed with the General tab selected by default.
Enter or select the following information in the Database Connection window and click Test.
The following table shows and describes the information that you need to fill out in the Database Connection window:
No.
Element Name
Description
1
Connection name
Enter a name that uniquely describes your connection, such as sampledata.
2
Connection type
Select Generic database.
3
Access
Select Native (JDBC).
4
Custom connection URL
Insert jdbc:Sqream://<host:port>/<database name>;user=<username>;password=<password>;[<optional parameters>; …];. The IP is a node in your SQream cluster and is the name or schema of the database you want to connect to. Verify that you have not used any leading or trailing spaces.
5
Custom driver class name
Insert com.sqream.jdbc.SQDriver. Verify that you have not used any leading or trailing spaces.
6
Username
Your SQreamdb username. If you leave this blank, you will be prompted to provide it when you connect.
7
Password
Your password. If you leave this blank, you will be prompted to provide it when you connect.
Click OK in the window above, in the Database Connection window, and Table Output window.
Importing Data
After defining your output you can begin importing your data.
For more information about backing up users, permissions, or schedules, see Backup and Restore Pentaho Repositories
To import data:
Double-click the Table output connection that you just created.
To the right of the Target schema field, click Browse and select a schema name.
Click OK. The selected schema name is displayed in the Target schema field.
Create a new hop connection between the CSV file input and Table output steps:
On the CSV file input step item, click the new hop connection icon.
Drag an arrow from the CSV file input step item to the Table output step item.
Release the mouse button. The following options are displayed.
Select Main output of step.
Double-click Table output to open the Table output dialog box.
In the Target table field, define a target table name.
Click SQL to open the Simple SQL editor.
In the Simple SQL editor, click Execute.
The system processes and displays the results of the SQL statements.
Close all open dialog boxes.
Click the play button to execute the transformation.
The Run Options dialog box is displayed.
Click Run.
The Execution Results are displayed.
Connect to SQream Using PHP
Overview
PHP is an open source scripting language that executes scripts on servers. The Connect to PHP page explains how to connect to a SQream cluster, and describes the following:
Installing PHP
To install PHP:
Download the JDBC driver installer from the SQream Drivers page.
Create a DSN.
Install the uODBC extension for your PHP installation.
For more information, navigate to PHP Documentation and see the topic menu on the right side of the page.
Configuring PHP
You can configure PHP in one of the following ways:
When compiling, configure PHP to enable uODBC using
./configure --with-pdo-odbc=unixODBC,/usr/local
.Install
php-odbc
andphp-pdo
along with PHP using your distribution package manager. SQream recommends a minimum of version 7.1 for the best results.
Note
PHP’s string size limitations truncates fetched text, which you can override by doing one of the following:
Increasing the php.ini default setting, such as the odbc.defaultlrl to 10000.
Setting the size limitation in your code before making your connection using ini_set(“odbc.defaultlrl”, “10000”);.
Setting the size limitation in your code before fetchng your result using odbc_longreadlen($result, “10000”);.
Operating PHP
After configuring PHP, you can test your connection.
To test your connection:
Create a test connection file using the correct parameters for your SQream installation, as shown below:
1<?php // Construct a DSN connection string 2$dsn = "SqreamODBC"; // Create a connection 3$conn = odbc_connect($dsn, '', ''); 4if (!($conn)) { 5 echo "Connection to SQream DB via ODBC failed: " . odbc_errormsg($conn); 6} 7$sql = "SELECT show_version()"; // Execute the query 8$rs = odbc_exec($conn, $sql); 9while (odbc_fetch_row($rs)) { 10 for ($i = 1; $i <= odbc_num_fields($rs); $i++) { 11 echo "Result is " . odbc_result($rs, $i); 12 } 13} 14echo "\n"; 15odbc_close($conn); // Finally, close the connection 16?>
For more information, download the sample
PHP example connection file
shown above.The following is an example of a valid DSN line:
$dsn = "odbc:Driver={SqreamODBCDriver};Server=192.168.0.5;Port=5000;Database=master;User=rhendricks;Password=super_secret;Service=sqream";
Run the PHP file either directly with PHP (
php test.php
) or through a browser.For more information about supported DSN parameters, see ODBC DSN Parameters.
BI Desktop
Power BI Desktop lets you connect to SQream and use underlying data as with other data sources in Power BI Desktop.
SQream integrates with Power BI Desktop to do the following:
Extract and transform your datasets into usable visual models in approximately one minute.
Use DAX functions (Data Analysis Expressions) to analyze your datasets.
Refresh datasets as needed or by using scheduled jobs.
SQream uses Power BI for extracting data sets using the following methods:
Direct query - Direct queries let you connect easily with no errors, and refresh Power BI artifacts, such as graphs and reports, in a considerable amount of time in relation to the time taken for queries to run using the SQream SQL CLI Reference guide.
Import - Lets you extract datasets from remote databases.
The Connect to SQream Using Power BI page describes the following:
Prerequisites
To connect to SQream, the following must be installed:
ODBC data source administrator - 32 or 64, depending on your operating system. For Windows users, the ODBC data source administrator is embedded within the operating system.
SQream driver - The SQream application required for interacting with the ODBC according to the configuration specified in the ODBC administrator tool.
Installing Power BI Desktop
To install Power BI Desktop:
Download Power BI Desktop 64x.
Download and configure your ODBC driver.
For information about downloading and configuring your ODBC driver, see ODBC or contact SQream Support.
Navigate to Windows > Documents and create a folder named Power BI Desktop with a subfolder named Custom Connectors.
From the Client Drivers page, download the PowerQuery.mez file.
Save the PowerQuery.mez file in the Custom Connectors folder you created in Step 3.
Open the Power BI application.
Navigate to File > Options and Settings > Option > Security > Data Extensions, and select (Not Recommended) Allow any extension to load without validation or warning.
Restart the Power BI Desktop application.
From the Get Data menu, select SQream.
Click Connect and provide the information shown in the following table:
Element Name
Description
Server
Provide the network address to your database server. You can use a hostname or an IP address.
Port
Provide the port that the database is responding to at the network address.
Database
Provide the name of your database or the schema on your database server.
User
Provide a SQreamdb username.
Passwords
Provide a password for your user.
Under Data Connectivity mode, select DirectQuery mode.
Click Connect.
Provide your user name and password and click Connect.
Best Practices for Power BI
SQream recommends using Power BI in the following ways for acquiring the best performance metrics:
Creating bar, pie, line, or plot charts when illustrating one or more columns.
Displaying trends and statuses using visual models.
Creating a unified view using PowerQuery to connect different data sources into a single dashboard.
Connect to SQream Using R
You can use R to interact with a SQream DB cluster.
This tutorial is a guide that will show you how to connect R to SQream DB.
JDBC
Get the SQream DB JDBC driver.
In R, install RJDBC
> install.packages("RJDBC") Installing package into 'C:/Users/r/...' (as 'lib' is unspecified) package 'RJDBC' successfully unpacked and MD5 sums checked
Import the RJDBC library
> library(RJDBC)
Set the classpath and initialize the JDBC driver which was previously installed. For example, on Windows:
> cp = c("C:\\Program Files\\SQream Technologies\\JDBC Driver\\2020.1-3.2.0\\sqream-jdbc-3.2.jar") > .jinit(classpath=cp) > drv <- JDBC("com.sqream.jdbc.SQDriver","C:\\Program Files\\SQream Technologies\\JDBC Driver\\2020.1-3.2.0\\sqream-jdbc-3.2.jar")
Open a connection with a JDBC connection string and run your first statement
> con <- dbConnect(drv,"jdbc:Sqream://127.0.0.1:3108/master;user=rhendricks;password=Tr0ub4dor&3;cluster=true") > dbGetQuery(con,"select top 5 * from t") xint xtinyint xsmallint xbigint 1 1 82 5067 1 2 2 14 1756 2 3 3 91 22356 3 4 4 84 17232 4 5 5 13 14315 5
Close the connection
> close(con)
A full example
> library(RJDBC)
> cp = c("C:\\Program Files\\SQream Technologies\\JDBC Driver\\2020.1-3.2.0\\sqream-jdbc-3.2.jar")
> .jinit(classpath=cp)
> drv <- JDBC("com.sqream.jdbc.SQDriver","C:\\Program Files\\SQream Technologies\\JDBC Driver\\2020.1-3.2.0\\sqream-jdbc-3.2.jar")
> con <- dbConnect(drv,"jdbc:Sqream://127.0.0.1:3108/master;user=rhendricks;password=Tr0ub4dor&3;cluster=true")
> dbGetQuery(con,"select top 5 * from t")
xint xtinyint xsmallint xbigint
1 1 82 5067 1
2 2 14 1756 2
3 3 91 22356 3
4 4 84 17232 4
5 5 13 14315 5
> close(con)
ODBC
Install the SQream DB ODBC driver for your operating system, and create a DSN.
In R, install RODBC
> install.packages("RODBC") Installing package into 'C:/Users/r/...' (as 'lib' is unspecified) package 'RODBC' successfully unpacked and MD5 sums checked
Import the RODBC library
> library(RODBC)
Open a connection handle to an existing DSN (
my_cool_dsn
in this example)> ch <- odbcConnect("my_cool_dsn",believeNRows=F)
Run your first statement
> sqlQuery(ch,"select top 5 * from t") xint xtinyint xsmallint xbigint 1 1 82 5067 1 2 2 14 1756 2 3 3 91 22356 3 4 4 84 17232 4 5 5 13 14315 5
Close the connection
> close(ch)
A full example
> library(RODBC)
> ch <- odbcConnect("my_cool_dsn",believeNRows=F)
> sqlQuery(ch,"select top 5 * from t")
xint xtinyint xsmallint xbigint
1 1 82 5067 1
2 2 14 1756 2
3 3 91 22356 3
4 4 84 17232 4
5 5 13 14315 5
> close(ch)
Connecting to SQream Using SAP BusinessObjects
The Connecting to SQream Using SAP BusinessObjects guide includes the following sections:
Overview
The Connecting to SQream Using SAP BusinessObjects guide describes the best practices for configuring a connection between SQream and the SAP BusinessObjects BI platform. SAP BO’s multi-tier architecture includes both client and server components, and this guide describes integrating SQream with SAP BO’s object client tools using a generic JDBC connector. The instructions in this guide are relevant to both the Universe Design Tool (UDT) and the Information Design Tool (IDT). This document only covers how to establish a connection using the generic out-of-the-box JDBC connectors, and does not cover related business object products, such as the Business Objects Data Integrator.
The Define a new connection window below shows the generic JDBC driver, which you can use to establish a new connection to a database.

SAP BO also lets you customize the interface to include a SQream data source.
Establising a New Connection Using a Generic JDCB Connector
This section shows an example of using a generic JDBC connector to establish a new connection.
To establish a new connection using a generic JDBC connector:
In the fields, provide a user name, password, database URL, and JDBC class.
The following is the correct format for the database URL:
<pre>jdbc:Sqream://<ipaddress>:3108/<nameofdatabase>
SQream recommends quickly testing your connection to SQream by selecting the Generic JDBC data source in the Define a new connection window. When you connect using a generic JDBC data source you do not need to modify your configuration files, but are limited to the out-of-the-box settings defined in the default jdbc.prm file.
Note
Modifying the jdbc.prm file for the generic driver impacts all other databases using the same driver.
For more information, see Connection String Examples.
(Optonal)If you are using the generic JDBC driver specific to SQream, modify the jdbc.sbo file to include the SQream JDBC driver location by adding the following lines under the Database section of the file:
Database Active="Yes" Name="SQream JDBC data source"> <JDBCDriver> <ClassPath> <Path>C:\Program Files\SQream Technologies\JDBC Driver\2021.2.0-4.5.3\sqream-jdbc-4.5.3.jar</Path> </ClassPath> </Parameter> <Parameter Name="JDBC Class"> com.sqream.jdbc.SQDriver </JDBCDriver> </DataBase>
Restart the BusinessObjects server.
When the connection is established, SQream is listed as a driver selection.
SAS Viya
Overview
SAS Viya is a cloud-enabled analytics engine used for producing useful insights. The Connect to SQream Using SAS Viya page describes how to connect to SAS Viya, and describes the following:
Installing SAS Viya
The Installing SAS Viya section describes the following:
Downloading SAS Viya
Integrating with SQream has been tested with SAS Viya v.03.05 and newer.
To download SAS Viya, see SAS Viya.
Installing the JDBC Driver
The SQream JDBC driver is required for establishing a connection between SAS Viya and SQream.
To install the JDBC driver:
Download the JDBC driver.
Unzip the JDBC driver into a location on the SAS Viya server.
SQream recommends creating the directory
/opt/sqream
on the SAS Viya server.
Configuring SAS Viya
After installing the JDBC driver, you must configure the JDBC driver from the SAS Studio so that it can be used with SQream Studio.
To configure the JDBC driver from the SAS Studio:
Sign in to the SAS Studio.
From the New menu, click SAS Program.
Configure the SQream JDBC connector by adding the following rows:
options sastrace='d,d,d,d' sastraceloc=saslog nostsuffix msglevel=i sql_ip_trace=(note,source) DEBUG=DBMS_SELECT; options validvarname=any; libname sqlib jdbc driver="com.sqream.jdbc.SQDriver" classpath="/opt/sqream/sqream-jdbc-4.0.0.jar" URL="jdbc:Sqream://sqream-cluster.piedpiper.com:3108/raviga;cluster=true" user="rhendricks" password="Tr0ub4dor3" schema="public" PRESERVE_TAB_NAMES=YES PRESERVE_COL_NAMES=YES;
For more information about writing a connection string, see Connect to SQream DB with a JDBC Application and navigate to Connection String.
Operating SAS Viya
The Operating SAS Viya section describes the following:
Using SAS Viya Visual Analytics
This section describes how to use SAS Viya Visual Analytics.
To use SAS Viya Visual Analytics:
Log in to SAS Viya Visual Analytics using your credentials:
Click New Report.
Click Data.
Click Data Sources.
Click the Connect icon.
From the Type menu, select Database.
Provide the required information and select Persist this connection beyond the current session.
Click Advanced and provide the required information.
Add the following additional parameters by clicking Add Parameters:
Name |
Value |
---|---|
class |
com.sqream.jdbc.SQDriver |
classPath |
<path_to_jar_file> |
url |
jdbc:Sqream://<IP>:<port>/<database>;cluster=true |
username |
<username> |
password |
<password> |
Click Test Connection.
If the connection is successful, click Save.
If your connection is not successful, see Troubleshooting SAS Viya below.
Troubleshooting SAS Viya
The Best Practices and Troubleshooting section describes the following best practices and troubleshooting procedures when connecting to SQream using SAS Viya:
Inserting Only Required Data
When using SAS Viya, SQream recommends using only data that you need, as described below:
Insert only the data sources you need into SAS Viya, excluding tables that don’t require analysis.
To increase query performance, add filters before analyzing. Every modification you make while analyzing data queries the SQream database, sometimes several times. Adding filters to the datasource before exploring limits the amount of data analyzed and increases query performance.
Creating a Separate Service for SAS Viya
SQream recommends creating a separate service for SAS Viya with the DWLM. This reduces the impact that Tableau has on other applications and processes, such as ETL. In addition, this works in conjunction with the load balancer to ensure good performance.
Locating the SQream JDBC Driver
In some cases, SAS Viya cannot locate the SQream JDBC driver, generating the following error message:
java.lang.ClassNotFoundException: com.sqream.jdbc.SQDriver
To locate the SQream JDBC driver:
Verify that you have placed the JDBC driver in a directory that SAS Viya can access.
Verify that the classpath in your SAS program is correct, and that SAS Viya can access the file that it references.
Restart SAS Viya.
For more troubleshooting assistance, see the SQream Support Portal.
Supporting TEXT
In SAS Viya versions lower than 4.0, casting TEXT
to CHAR
changes the size to 1,024, such as when creating a table including a TEXT
column. This is resolved by casting TEXT
into CHAR
when using the JDBC driver.
Connect to SQream Using SQL Workbench
You can use SQL Workbench to interact with a SQream DB cluster. SQL Workbench/J is a free SQL query tool, and is designed to run on any JRE-enabled environment.
This tutorial is a guide that will show you how to connect SQL Workbench to SQream DB.
Installing SQL Workbench with the SQream Installer
This section applies to Windows only.
SQream DB’s driver installer for Windows can install the Java prerequisites and SQL Workbench for you.
Get the JDBC driver installer available for download from the SQream Drivers page. The Windows installer takes care of the Java prerequisites and subsequent configuration.
Install the driver by following the on-screen instructions in the easy-to-follow installer. By default, the installer does not install SQL Workbench. Make sure to select the item!
Note
The installer will install SQL Workbench in C:\Program Files\SQream Technologies\SQLWorkbench
by default. You can change this path during the installation.
Once finished, SQL Workbench is installed and contains the necessary configuration for connecting to SQream DB clusters.
Start SQL Workbench from the Windows start menu. Be sure to select SQL Workbench (64) if you’re on 64-bit Windows.
You are now ready to create a profile for your cluster. Continue to Creating a new connection profile.
Installing SQL Workbench Manually
This section applies to Linux and MacOS only.
Install Java Runtime
Both SQL Workbench and the SQream DB JDBC driver require Java 1.8 or newer. You can install either Oracle Java or OpenJDK.
Oracle Java
Download and install Java 8 from Oracle for your platform - https://www.java.com/en/download/manual.jsp
OpenJDK
For Linux and BSD, see https://openjdk.java.net/install/
For Windows, SQream recommends Zulu 8 https://www.azul.com/downloads/zulu-community/?&version=java-8-lts&architecture=x86-64-bit&package=jdk
Get the SQream DB JDBC Driver
SQream DB’s JDBC driver is provided as a zipped JAR file, available for download from the SQream Drivers page.
Download and extract the JAR file from the zip archive.
Install SQL Workbench
Download the latest stable release from https://www.sql-workbench.eu/downloads.html . The Generic package for all systems is recommended.
Extract the downloaded ZIP archive into a directory of your choice.
Start SQL workbench. If you are using 64 bit windows, run
SQLWorkbench64.exe
instead ofSQLWOrkbench.exe
.
Setting up the SQream DB JDBC Driver Profile
Define a connection profile -
Open the drivers management window -
Create the SQream DB driver profile
Click on the Add new driver button (“New” icon)
Name the driver as you see fit. We recommend calling it SQream DB <version>, where <version> is the version you have installed.
Add the JDBC drivers from the location where you extracted the SQream DB JDBC JAR.
If you used the SQream installer, the file will be in
C:\Program Files\SQream Technologies\JDBC Driver\
Click the magnifying glass button to detect the classname automatically. Other details are purely optional
Click OK to save and return to “new connection screen”
Create a New Connection Profile for Your Cluster
Create new connection by clicking the New icon (top left)
Give your connection a descriptive name
Select the SQream Driver that was created in the previous screen
Type in your connection string. To find out more about your connection string (URL), see the Connection string documentation.
Text the connection details
Click OK to save the connection profile and connect to SQream DB
Suggested Optional Configuration
If you installed SQL Workbench manually, you can set a customization to help SQL Workbench show information correctly in the DB Explorer panel.
Locate your workbench.settings file On Windows, typically:
C:\Users\<user name>\.sqlworkbench\workbench.settings
On Linux,$HOME/.sqlworkbench
Add the following line at the end of the file:
workbench.db.sqreamdb.schema.retrieve.change.catalog=true
Save the file and restart SQL Workbench
Connecting to SQream Using Tableau
SQream’s Tableau connector, based on standard JDBC, enables storing and fast querying large volumes of data. This connector is useful for users who want to integrate and analyze data from various sources within the Tableau platform. With the Tableau connector, users can easily connect to databases and cloud applications and perform high-speed queries on large datasets. Additionally, the connector allows for seamless integration with Tableau, enabling users to visualize their data.
SQream supports both Tableau Desktop and Tableau Server on Windows, MacOS, and Linux distributions.
For more information on SQream’s integration with Tableau, see Tableau Connectors.
Prerequisites
It is essential that you have the following installed:
Tableau version 9.2 or newer
Setting Up JDBC
Download the SQream JDBC Connector .jar file.
Place the JDBC .jar file in the Tableau driver directory.
Based on your operating system, you may find the Tableau driver directory in one of the following locations:
Tableau Desktop on MacOS:
~/Library/Tableau/Drivers
Tableau Desktop on Windows:
C:\Program Files\Tableau\Drivers
Tableau on Linux:
/opt/tableau/tableau_driver/jdbc
Installing the Tableau Connector
Download the Tableau Connector
SQreamDB.taco
file.Based on the installation method that you used for installing Tableau, place the Tableau Connector
SQreamDB.taco
file in the Tableau connector directory:
Product / Platform |
Path |
---|---|
Tableau Desktop for Windows |
|
Tableau Desktop for Mac |
|
Tableau Prep for Windows |
|
Tableau Prep for Mac |
|
Flow web authoring on Tableau Server |
|
Tableau Prep Conductor on Tableau Server |
|
Tableau Server |
|
Restart Tableau Desktop or Tableau server.
Connecting to SQream
Start Tableau Desktop.
In the Connect menu, under the To a Server option , click More.
Additional connection options are displayed.
Select SQream DB by SQream Technologies.
The connection dialog box is displayed.
In the connection dialog box, fill in the following fields:
Field name
Description
Example
Server
Defines the SQreamDB worker machine IP.
Avoid using the loopback address (127.0.0.1) or “localhost” as a server address since it typically refers to the local machine where Tableau is installed and may create issues and limitations
192.162.4.182
orsqream.mynetwork.com
Port
Defines the TCP port of the SQream worker
3108
when using a load balancer, or5100
when connecting directly to a worker with SSLDatabase
Defines the database to establish a connection with
master
Cluster
Enables (
true
) or disables (false
) the load balancer. After enabling or disabling the load balance, verify the connectionUsername
Specifies the username of a role to use when connecting
rhendricks
Password
Specifies the password of the selected role
Tr0ub4dor&3
Require SSL
Sets SSL as a requirement for establishing this connection
Click Sign In.
The connection is established, and the data source page is displayed.
Connecting to SQream Using Talend
Overview
This page describes how to use Talend to interact with a SQream cluster. The Talend connector is used for reading data from a SQream cluster and loading data into SQream. In addition, this page provides a viability report on Talend’s comptability with SQream for stakeholders.
The Connecting to SQream Using Talend describes the following:
Creating a New Metadata JDBC DB Connection
To create a new metadata JDBC DB connection:
In the Repository panel, nagivate to Metadata and right-click Db connections.
Select Create connection.
In the Name field, type a name.
Note that the name cannot contain spaces.
In the Purpose field, type a purpose and click Next.
Note that you cannot continue to the next step until you define both a Name and a Purpose.
In the DB Type field, select JDBC.
In the JDBC URL field, type the relevant connection string.
For connection string examples, see Connection Strings.
In the Drivers field, click the Add button.
The “newLine” entry is added.
One the “newLine’ entry, click the ellipsis.
The Module window is displayed.
From the Module window, select Artifact repository(local m2/nexus) and select Install a new module.
Click the ellipsis.
Your hard drive is displayed.
Navigate to a JDBC jar file (such as sqream-jdbc-4.5.3.jar)and click Open.
Click Detect the module install status.
Click OK.
The JDBC that you selected is displayed in the Driver field.
Click Select class name.
Click Test connection.
If a driver class is not found (for example, you didn’t select a JDBC jar file), the following error message is displayed:
After creating a new metadata JDBC DB connection, you can do the following:
Use your new metadata connection.
Drag it to the job screen.
Build Talend components.
For more information on loading data from JSON files to the Talend Open Studio, see How to Load Data from JSON Files in Talend.
Supported SQream Drivers
The following list shows the supported SQream drivers and versions:
JDBC - Version 4.3.3 and higher.
ODBC - Version 4.0.0. This version requires a Bridge to connect. For more information on the required Bridge, see Connecting Talend on Windows to an ODBC Database.
Supported Data Sources
Talend Cloud connectors let you create reusable connections with a wide variety of systems and environments, such as those shown below. This lets you access and read records of a range of diverse data.
Connections: Connections are environments or systems for storing datasets, including databases, file systems, distributed systems and platforms. Because these systems are reusable, you only need to establish connectivity with them once.
Datasets: Datasets include database tables, file names, topics (Kafka), queues (JMS) and file paths (HDFS). For more information on the complete list of connectors and datasets that Talend supports, see Introducing Talend Connectors.
Known Issues
As of 6/1/2021 schemas were not displayed for tables with identical names.
If you experience issues using Talend, see the SQream support portal.
Connecting to SQream Using TIBCO Spotfire
Overview
The TIBCO Spotfire software is an analytics solution that enables visualizing and exploring data through dashboards and advanced analytics.
This document is a Quick Start Guide that describes the following:
Establishing a Connection between TIBCO Spotfire and SQream
TIBCO Spotfire supports the following versions:
JDBC driver - Version 4.5.2
ODBC driver - Version 4.1.1
SQream supports TIBCO Spotfire version 7.12.0.
The Establishing a JDBC Connection between TIBCO Spotfire and SQream section describes the following:
Creating a JDBC Connection
For TIBCO Spotfire to recognize SQream, you must add the correct JDBC jar file to Spotfire’s loaded binary folder. The following is an example of a path to the Spotfire loaded binaries folder: C:\tibco\tss\7.12.0\tomcat\bin
.
For the complete TIBCO Spotfire documentation, see TIBCO Spotfire® JDBC Data Access Connectivity Details.
Creating an ODBC Connection
To create an ODBC connection
Install and configure ODBC on Windows.
For more information, see Install and Configure ODBC on Windows.
Launch the TIBCO Spotfire application.
From the File menu click Add Data Tables.
The Add Database Tables window is displayed.
Click Add and select Database.
The Open Database window is displayed.
In the Data source type area, select ODBC SQream (Odbc Data Provider) and click Configure.
The Configure Data Source and Connection window is displayed.
Select System or user data source and from the drop-down menu select the DSN of your data source (SQreamDB).
Provide your database username and password and click OK.
In the Open Database window, click OK.
The Specify Tables and Columns window is displayed.
In the Specify Tables and Columns window, select the checkboxes corresponding to the tables and columns that you want to include in your SQL statement.
In the Data source name field, set your data source name and click OK.
Your data source is displayed in the Data tables area.
In the Add Data Tables dialog, click OK to load the data from your ODBC data source into Spotfire.
Note
Verify that you have checked the SQL statement.
Creating the SQream Data Source Template
After creating a connection, you can create your SQream data source template.
To create your SQream data source template:
Log in to the TIBCO Spotfire Server Configuration Tool.
From the Configuration tab, in the Configuration Start menu, click Data Source Templates.
The Data Source Templates list is displayed.
From the Data Source Templates list do one of the following:
Override an existing template:
In the template text field, select an existing template.
Copy and paste your data source template text.
Create a new template:
Click New.
The Add Data Source Template window is displayed.
In the Name field, define your template name.
In the Data Source Template text field, copy and paste your data source template text.
The following is an example of a data source template:
<jdbc-type-settings> <type-name>SQream</type-name> <driver>com.sqream.jdbc.SQDriver</driver> <connection-url-pattern>jdbc:Sqream://<host>:<port>/database;user=sqream;password=sqream;cluster=true</connection-url-pattern> <supports-catalogs>true</supports-catalogs> <supports-schemas>true</supports-schemas> <supports-procedures>false</supports-procedures> <table-types>TABLE,EXTERNAL_TABLE</table-types> <java-to-sql-type-conversions> <type-mapping> <from>Bool</from> <to>Integer</to> </type-mapping> <type-mapping> <from>VARCHAR(2048)</from> <to>String</to> </type-mapping> <type-mapping> <from>INT</from> <to>Integer</to> </type-mapping> <type-mapping> <from>BIGINT</from> <to>LongInteger</to> </type-mapping> <type-mapping> <from>Real</from> <to>Real</to> </type-mapping> <type-mapping> <from>Decimal</from> <to>Float</to> </type-mapping> <type-mapping> <from>Numeric</from> <to>Float</to> </type-mapping> <type-mapping> <from>Date</from> <to>DATE</to> </type-mapping> <type-mapping> <from>DateTime</from> <to>DateTime</to> </type-mapping> </java-to-sql-type-conversions> <ping-command></ping-command> </jdbc-type-settings>
Click Save configuration.
Close and restart your Spotfire server.
Creating a Data Source
After creating the SQream data source template, you can create a data source.
To create a data source:
Launch the TIBCO Spotfire application.
From the Tools menu, select Information Designer.
The Information Designer window is displayed.
From the New menu, click Data Source.
The Data Source tab is displayed.
Provide the following information:
Name - define a unique name.
Type - use the same type template name you used while configuring your template. See Step 3 in Creating the SQream Data Source Template.
Connection URL - use the standard JDBC connection string,
<ip>:<port>/database
.No. of connections - define a number between 1 and 100. SQream recommends setting your number of connections to 100.
Username and Password - define your SQream username and password.
Creating an Information Link
After creating a data source, you can create an information link.
To create an information link:
From the Tools menu, select Information Designer.
The Information Designer window is displayed.
From the New menu, click Information Link.
The Information link tab is displayed.
From the Elements tab, select a column type and click Add.
The column type is added to the Elements region as a filter.
Note the following:
You can select procedures from the Elements region.
You can remove an element by selecting an element and clicking Remove.
Tip
If the Elements menu is not displayed, you can display it by clicking the Elements tab. You can simultaneously select multiple elements by pressing Ctrl and making additional selections, and select a range of elements by holding Shift and clicking two elements.
If the elements you select originate from more than one data source table, specify a Join path.
Optional - In the Description region, type the description of the information link.
Optional - To filter your data, expand the Filters section and do the following:
From the Information Link region, select the element you added in Step 3 above.
Click Add.
The Add Column window is displayed.
From the drop-down list, select a column to add a hard filter to and click OK.
The selected column is added to the Filters list.
Repeat steps 2 and 3 to add filters to additional columns.
For each column, from the Filter Type drop-down list, select range or values.
Note
Filtering by range means entering the upper and lower limits of the desired range. Filtering by values means entering the exact values that you want to include in the returned data, separated by semicolon.
In the Values field type the desired values separated with semicolons, or set the upper and lower limits in the Min Value and Max Value fields. Alternatively, you can type
?param_name
in the Values field to use a parameter as the filter for the selected column, whereparam_name
is the name used to identify the parameter.Note
Because limits are inclusive, setting the lower limit to 1000 includes the value 1000 in the data table.
Note
When setting upper and lower limits on String type columns,
A
precedesAA
, and a lone letter precedes words beginning with that latter. For example,S** precedes **Smith**, indicating that the name ``Smith
will not be present when you select names fromD
toS
. The order of characters is standard ASCII.
For more information on adding filters, see Adding Hard Filters.
Optional - To add runtime filtering prompts, expand the Prompts section and do the following:
Click Add.
The Add Column window is displayed.
From the Select column list, select a column to add a prompt to and click OK.
The selected column is added to the Prompts list.
Repeat Step 1 to add prompts to additional columns.
Do the following for each column:
Make a selection from the Prompt Type drop-down list.
Select or clear Mandatory.
Optional - Set your Max Selections.
For more information on adding prompts, see Adding Prompts.
Optional - Expand the Conditioning section and specify one of the following conditions:
None
Distinct
Pivot
Note that you can edit the Pivot conditioning by selecting Pivot and clicking Edit.
Optional - Expand the Parameters section and define your parameters.
Optional - Expand the Properties section and define your properties.
Optional - Expand the Caching section and enable or disable whether your information link can be cached.
Click Save.
The Save As window is displayed.
In the tree, select where you want to save the information link.
In the Name field, type a name and description for the information link.
Click Save.
The new information link is added to the library and can be accessed by other users.
Tip
You can test the information link directly by clicking Open Data. You can also view and edit the SQL belonging to the information link by clicking SQL.
For more information on the Information Link attributes, see Information Link Tab.
Troubleshooting
The Troubleshooting section describes the following scenarios:
The JDBC Driver does not Support Boolean, Decimal, or Numeric Types
When attempting to load data, the the Boolean, Decimal, or Numeric column types are not supported and generate the following error:
Failed to execute query: Unsupported JDBC data type in query result: Bool (HRESULT: 80131500)
The error above is resolved by casting the columns as follows:
Bool
columns toINT
.Decimal
andNumeric
columns toREAL
.
For more information, see the following:
Resolving this error - Details on Change Data Types.
Supported data types - Data Types.
Information Services do not Support Live Queries
TIBCO Spotfire data connectors support live queries, but no APIs currently exist for creating custom data connectors. This is resolved by creating a customized SQream adapter using TIBCO’s Data Virtualization (TDV) or the Spotfire Advanced Services (ADS). These can be used from the built-in TDV connector to enable live queries.
This resolution applies to JDBC and ODBC drivers.

Client Drivers
The guides on this page describe how to use the Sqream DB client drivers and client applications with SQream.
Client Driver Downloads
All Operating Systems
The following are applicable to all operating systems:
JDBC - recommended installation via
mvn
:JDBC .jar file - sqream-jdbc-4.5.9 (.jar)
Python - Recommended installation via
pip
:Python .tar file - pysqream v3.2.5 (.tar.gz)
Node.JS - Recommended installation via
npm
:Node.JS - sqream-v4.2.4 (.tar.gz)
Tableau:
Tableau connector - SQream (.taco)
Power BI:
Power BI PowerQuery Connector - SQream (.mez)
Windows
The following are applicable to Windows:
For the ODBC installer, please contact your Sqream representative.
For more information on installing and configuring ODBC on Windows, see Install and configure ODBC on Windows.
Net driver - SQream .Net driver v3.0.2
Connecting to SQream Using Trino
If you are using Trino for distributed SQL query processing and wish to use it to connect to a SQream, follow these instructions.
Prerequisites
To use Trino with SQream, you must have the following installed:
SQream version 4.1 or later
Trino version 403 or later
Trino Connector xxxx
JDBC version 4.5.6 or later
Installation
JDBC
In case JDBC is not yet configured, follow the JDBC Client Drivers page for registration and configuration guidance.
Trino Connector
The Trino Connector must be installed on each cluster node dedicated to Trino.
Create a dedicated directory for the Trino Connector.
Download the Trino Connector<…> and extract the content of the ZIP file to the dedicated directory, as shown in the example:
trino-server/
└── plugin
└── sqream
├── sqream-jdbc.jar
├── trino-sqream-services.jar
├── trino-sqream-SNAPSHOT.jar
└── all dependencies
Connecting to SQream
Trino uses catalogs for referencing stored objects such as tables, databases, and functions. Each Trino catalog may be configured with access to a single SQream database. If you wish Trino to have access to more than one SQream database or server, you must create additional catalogs.
Catalogs may be created using properties
files. Start by creating a sqream.properties
file and placing it under trino-server/etc/catalog
.
The following is an example of a properties file:
connector.name=<name>
connection-url=jdbc:Sqream://<host and port>/<database name>;[<optional parameters>; ...]
connection-user=<user>
connection-password=<password>
Syntax examples
The following is an example of the SHOW SCHEMAS FROM
statement:
SHOW SCHEMAS FROM sqream;
The following is an example of the SHOW TABLES FROM
statement:
SHOW TABLES FROM sqream.public;
The following is an example of the DESCRIBE sqream.public.t
statement:
DESCRIBE sqream.public.t;
Supported Data Types and Mapping
Use the appropriate Trino data type for executing queries. Upon execution, incompatible data types will be converted by Trino to SQream data types.
Trino type |
SQream type |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note
VARCHAR
is soon to be deprecated and may not be used in SQream DB.
Limitations
The Trino Connector does not support the following SQL statements:
GRANT
REVOKE
SHOW GRANTSHOW ROLES
SHOW ROLE GRANTS
JDBC
The SQream JDBC driver lets you connect to SQream using many Java applications and tools. This page describes how to write a Java application using the JDBC interface. The JDBC driver requires Java 1.8 or newer.
Installing the JDBC Driver
The Installing the JDBC Driver section describes the following:
Prerequisites
The SQream JDBC driver requires Java 1.8 or newer, and SQream recommends using Oracle Java or OpenJDK.:
Getting the JAR file
The SQream JDBC driver is available for download from the client drivers download page. This JAR file can be integrated into your Java-based applications or projects.
Setting Up the Class Path
To use the driver, you must include the JAR named sqream-jdbc-<version>.jar
in the class path, either by inserting it in the CLASSPATH
environment variable, or by using flags on the relevant Java command line.
For example, if the JDBC driver has been unzipped to /home/sqream/sqream-jdbc-4.3.0.jar
, the following command is used to run application:
$ export CLASSPATH=/home/sqream/sqream-jdbc-4.3.0.jar:$CLASSPATH
$ java my_java_app
Alternatively, you can pass -classpath
to the Java executable file:
$ java -classpath .:/home/sqream/sqream-jdbc-4.3.0.jar my_java_app
Connecting to SQream Using a JDBC Application
You can connect to SQream using one of the following JDBC applications:
Driver Class
Use com.sqream.jdbc.SQDriver
as the driver class in the JDBC application.
Connection String
JDBC drivers rely on a connection string.
The following is the syntax for SQream:
jdbc:Sqream://<host and port>/<database name>;user=<username>;password=<password>;[<optional parameters>; ...]
Connection Parameters
The following table shows the connection string parameters:
Item |
State |
Default |
Description |
---|---|---|---|
|
Mandatory |
None |
Hostname and port of the SQream DB worker. For example, |
|
Mandatory |
None |
Database name to connect to. For example, |
|
Optional |
None |
Username of a role to use for connection. For example, |
|
Optional |
None |
Specifies the password of the selected role. For example, |
|
Optional |
|
Specifices service queue to use. For example, |
|
Optional |
|
Specifies SSL for this connection. For example, |
|
Optional |
|
Connect via load balancer (use only if exists, and check port). |
|
Optional |
|
Enables on-demand loading, and defines double buffer size for the result. The |
|
Optional |
|
Defines the bytes size for inserting a buffer before flushing data to the server. Clients running a parameterized insert (network insert) can define the amount of data to collect before flushing the buffer. |
|
Optional |
|
Defines the logger level as either |
|
Optional |
|
Enables the file appender and defines the file name. The file name can be set as either the file name or the file path. |
|
Optional |
0 |
Sets the duration, in seconds, for which a database connection can remain idle before it is terminated. If the parameter is set to its default value, idle connections will not be terminated. The idle connection timer begins counting after the completion of query execution. |
Connection String Examples
The following is an example of a SQream cluster with a load balancer and no service queues (with SSL):
jdbc:Sqream://sqream.mynetwork.co:3108/master;user=rhendricks;password=Tr0ub4dor&3;ssl=true;cluster=true
The following is a minimal example of a local standalone SQream database:
jdbc:Sqream://127.0.0.1:5000/master;user=rhendricks;password=Tr0ub4dor&3
The following is an example of a SQream cluster with a load balancer and a specific service queue named etl
, to the database named raviga
jdbc:Sqream://sqream.mynetwork.co:3108/raviga;user=rhendricks;password=Tr0ub4dor&3;cluster=true;service=etl
Java Program Sample
You can download the JDBC Application Sample File
below by right-clicking and saving it to your computer.
1import java.sql.Connection;
2import java.sql.DatabaseMetaData;
3import java.sql.DriverManager;
4import java.sql.Statement;
5import java.sql.ResultSet;
6
7import java.io.IOException;
8import java.security.KeyManagementException;
9import java.security.NoSuchAlgorithmException;
10import java.sql.SQLException;
11
12
13
14public class SampleTest {
15
16 // Replace with your connection string
17 static final String url = "jdbc:Sqream://sqream.mynetwork.co:3108/master;user=rhendricks;password=Tr0ub4dor&3;ssl=true;cluster=true";
18
19 // Allocate objects for result set and metadata
20 Connection conn = null;
21 Statement stmt = null;
22 ResultSet rs = null;
23 DatabaseMetaData dbmeta = null;
24
25 int res = 0;
26
27 public void testJDBC() throws SQLException, IOException {
28
29 // Create a connection
30 conn = DriverManager.getConnection(url,"rhendricks","Tr0ub4dor&3");
31
32 // Create a table with a single integer column
33 String sql = "CREATE TABLE test (x INT)";
34 stmt = conn.createStatement(); // Prepare the statement
35 stmt.execute(sql); // Execute the statement
36 stmt.close(); // Close the statement handle
37
38 // Insert some values into the newly created table
39 sql = "INSERT INTO test VALUES (5),(6)";
40 stmt = conn.createStatement();
41 stmt.execute(sql);
42 stmt.close();
43
44 // Get values from the table
45 sql = "SELECT * FROM test";
46 stmt = conn.createStatement();
47 rs = stmt.executeQuery(sql);
48 // Fetch all results one-by-one
49 while(rs.next()) {
50 res = rs.getInt(1);
51 System.out.println(res); // Print results to screen
52 }
53 rs.close(); // Close the result set
54 stmt.close(); // Close the statement handle
55 conn.close();
56 }
57
58
59 public static void main(String[] args) throws SQLException, KeyManagementException, NoSuchAlgorithmException, IOException, ClassNotFoundException{
60
61 // Load SQream DB JDBC driver
62 Class.forName("com.sqream.jdbc.SQDriver");
63
64 // Create test object and run
65 SampleTest test = new SampleTest();
66 test.testJDBC();
67 }
68}
Prepared Statements
Prepared statements, also known as parameterized queries, are a feature of JDBC that enable the use of parameters to optimize query execution, enhance security, and enable query template reuse with different parameter values in Java applications.
Prepared Statement Sample
The following is a Java code snippet employing a JDBC prepared statement object to ingest a batch of one million records into SQreamDB.
You may download the Prepared statement
by right-clicking and saving it to your computer.
Connecting to SQream Using Python (pysqream)
The current Pysqream connector supports Python version 3.9 and newer. It includes a set of packages that allows Python programs to connect to SQream DB. The base pysqream
package conforms to Python DB-API specifications PEP-249.
pysqream
is a pure Python connector that can be installed with pip
on any operating system, including Linux, Windows, and macOS. pysqream-sqlalchemy
is a SQLAlchemy dialect for pysqream
.
Installing the Python Connector
Prerequisites
It is essential that you have the following installed:
Python
The connector requires Python version 3.9 or newer.
To see your current Python version, run the following command:
$ python --version
PIP
The Python connector is installed via pip
, the standard package manager for Python, which is used to install, upgrade and manage Python packages (libraries) and their dependencies.
We recommend upgrading to the latest version of pip
before installing.
To verify that you have the latest version, run the following command:
$ python3 -m pip install --upgrade pip
Collecting pip
Downloading https://files.pythonhosted.org/packages/00/b6/9cfa56b4081ad13874b0c6f96af8ce16cfbc1cb06bedf8e9164ce5551ec1/pip-19.3.1-py2.py3-none-any.whl (1.4MB)
|████████████████████████████████| 1.4MB 1.6MB/s
Installing collected packages: pip
Found existing installation: pip 19.1.1
Uninstalling pip-19.1.1:
Successfully uninstalled pip-19.1.1
Successfully installed pip-19.3.1
Note
On macOS, you may want to use virtualenv to install Python and the connector, to ensure compatibility with the built-in Python environment
If you encounter an error including
SSLError
orWARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.
- please be sure to reinstall Python with SSL enabled, or use virtualenv or Anaconda.
OpenSSL for Linux
The Python connector relies on OpenSSL for secure connections to SQream DB. Some distributions of Python do not include OpenSSL.
To install OpenSSL on RHEL/CentOS, run the following command:
$ sudo yum install -y libffi-devel openssl-devel
To install OpenSSL on Ubuntu, run the following command:
$ sudo apt-get install libssl-dev libffi-dev -y
Installing via PIP with an internet connection
The Python connector is available via PyPi.
To install the connector using pip, it is advisable to use the -U
or --user
flags instead of sudo, as it ensures packages are installed per user. However, it is worth noting that the connector can only be accessed under the same user.
To install pysqream
and pysqream-sqlalchemy
with the --user
flag, run the following command:
$ pip3.9 install pysqream pysqream-sqlalchemy --user
pip3
will automatically install all necessary libraries and modules.
Installing via PIP without an internet connection
To get the
.whl
package file, contact you SQream support representative.Run the following command:
tar -xf pysqream_connector_3.2.5.tar.gz
cd pysqream_connector_3.2.5
#Install all packages with --no-index --find-links .
python3 -m pip install *.whl -U --no-index --find-links .
python3.9 -m pip install pysqream-3.2.5.zip -U --no-index --find-links .
python3.9 -m pip install pysqream-sqlalchemy-0.8.zip -U --no-index --find-links .
Upgrading an Existing Installation
The Python drivers are updated periodically. To upgrade an existing pysqream installation, use pip’s -U
flag:
$ pip3.9 install pysqream pysqream-sqlalchemy -U
SQLAlchemy Examples
SQLAlchemy is an **Object-Relational Mapper (ORM) for Python. When you install the SQream dialect (pysqream-sqlalchemy
) you can use frameworks such as Pandas, TensorFlow, and Alembic to query SQream directly.
This section includes the following examples:
Standard Connection Example
import sqlalchemy as sa
engine_url = "sqream://rhendricks:secret_password@localhost:5000/raviga"
engine = sa.create_engine(engine_url)
res = engine.execute('create or replace table test (ints int, ints2 int)')
res = engine.execute('insert into test (ints,ints2) values (5,1), (6,2)')
res = engine.execute('select * from test')
for item in res:
print(item)
Multi Cluster Connection Example
The following example is for using a ServerPicker:
import sqlalchemy as sa
engine_url = "sqream://rhendricks:secret_password@localhost:5000/raviga"
engine = sa.create_engine(engine_url, connect_args={"clustered": True})
res = engine.execute("create or replace table tab1 (x int);")
res = engine.execute('insert into tab1 values (5), (6);')
res = engine.execute('select * from tab1')
for item in res:
print(item)
Pulling a Table into Pandas
The following example shows how to pull a table in Pandas. This examples uses the URL method to create the connection string:
import sqlalchemy as sa
import pandas as pd
engine_url = "sqream://rhendricks:secret_password@localhost:5000/raviga"
engine = sa.create_engine(engine_url)
table_df = pd.read_sql("select * from nba", con=engine)
API Examples
This section includes the following examples:
Using the Cursor
The DB-API specification includes several methods for fetching results from the cursor. This sections shows an example using the nba
table, which looks as follows:
Name |
Team |
Number |
Position |
Age |
Height |
Weight |
College |
Salary |
---|---|---|---|---|---|---|---|---|
Avery Bradley |
Boston Celtics |
0.0 |
PG |
25.0 |
6-2 |
180.0 |
Texas |
7730337.0 |
Jae Crowder |
Boston Celtics |
99.0 |
SF |
25.0 |
6-6 |
235.0 |
Marquette |
6796117.0 |
John Holland |
Boston Celtics |
30.0 |
SG |
27.0 |
6-5 |
205.0 |
Boston University |
|
R.J. Hunter |
Boston Celtics |
28.0 |
SG |
22.0 |
6-5 |
185.0 |
Georgia State |
1148640.0 |
Jonas Jerebko |
Boston Celtics |
8.0 |
PF |
29.0 |
6-10 |
231.0 |
5000000.0 |
|
Amir Johnson |
Boston Celtics |
90.0 |
PF |
29.0 |
6-9 |
240.0 |
12000000.0 |
|
Jordan Mickey |
Boston Celtics |
55.0 |
PF |
21.0 |
6-8 |
235.0 |
LSU |
1170960.0 |
Kelly Olynyk |
Boston Celtics |
41.0 |
C |
25.0 |
7-0 |
238.0 |
Gonzaga |
2165160.0 |
Terry Rozier |
Boston Celtics |
12.0 |
PG |
22.0 |
6-2 |
190.0 |
Louisville |
1824360.0 |
As before, you must import the library and create a Connection()
, followed by execute()
on a simple SELECT *
query:
import pysqream
con = pysqream.connect(host='127.0.0.1', port=3108, database='master'
, username='rhendricks', password='Tr0ub4dor&3'
, clustered=True)
cur = con.cursor() # Create a new cursor
# The select statement:
statement = 'SELECT * FROM nba'
cur.execute(statement)
When the statement has finished executing, you have a Connection
cursor object waiting. A cursor is iterable, meaning that it advances the cursor to the next row when fetched.
You can use fetchone()
to fetch one record at a time:
first_row = cur.fetchone() # Fetch one row at a time (first row)
second_row = cur.fetchone() # Fetch one row at a time (second row)
To fetch several rows at a time, use fetchmany()
:
# executing `fetchone` twice is equivalent to this form:
third_and_fourth_rows = cur.fetchmany(2)
To fetch all rows at once, use fetchall()
:
# To get all rows at once, use `fetchall`
remaining_rows = cur.fetchall()
cur.close()
# Close the connection when done
con.close()
The following is an example of the contents of the row variables used in our examples:
>>> print(first_row)
('Avery Bradley', 'Boston Celtics', 0, 'PG', 25, '6-2', 180, 'Texas', 7730337)
>>> print(second_row)
('Jae Crowder', 'Boston Celtics', 99, 'SF', 25, '6-6', 235, 'Marquette', 6796117)
>>> print(third_and_fourth_rows)
[('John Holland', 'Boston Celtics', 30, 'SG', 27, '6-5', 205, 'Boston University', None), ('R.J. Hunter', 'Boston Celtics', 28, 'SG', 22, '6-5', 185, 'Georgia State', 1148640)]
>>> print(remaining_rows)
[('Jonas Jerebko', 'Boston Celtics', 8, 'PF', 29, '6-10', 231, None, 5000000), ('Amir Johnson', 'Boston Celtics', 90, 'PF', 29, '6-9', 240, None, 12000000), ('Jordan Mickey', 'Boston Celtics', 55, 'PF', 21, '6-8', 235, 'LSU', 1170960), ('Kelly Olynyk', 'Boston Celtics', 41, 'C', 25, '7-0', 238, 'Gonzaga', 2165160),
[...]
Note
Calling a fetch command after all rows have been fetched will return an empty array ([]
).
Reading Result Metadata
When you execute a statement, the connection object also contains metadata about the result set, such as column names, types, etc).
The metadata is stored in the Connection.description
object of the cursor:
>>> import pysqream
>>> con = pysqream.connect(host='127.0.0.1', port=3108, database='master'
... , username='rhendricks', password='Tr0ub4dor&3'
... , clustered=True)
>>> cur = con.cursor()
>>> statement = 'SELECT * FROM nba'
>>> cur.execute(statement)
<pysqream.dbapi.Connection object at 0x000002EA952139B0>
>>> print(cur.description)
[('Name', 'STRING', 24, 24, None, None, True), ('Team', 'STRING', 22, 22, None, None, True), ('Number', 'NUMBER', 1, 1, None, None, True), ('Position', 'STRING', 2, 2, None, None, True), ('Age (as of 2018)', 'NUMBER', 1, 1, None, None, True), ('Height', 'STRING', 4, 4, None, None, True), ('Weight', 'NUMBER', 2, 2, None, None, True), ('College', 'STRING', 21, 21, None, None, True), ('Salary', 'NUMBER', 4, 4, None, None, True)]
You can fetch a list of column names by iterating over the description
list:
>>> [ i[0] for i in cur.description ]
['Name', 'Team', 'Number', 'Position', 'Age (as of 2018)', 'Height', 'Weight', 'College', 'Salary']
Loading Data into a Table
This example shows how to load 10,000 rows of dummy data to an instance of SQream.
To load data 10,000 rows of dummy data to an instance of SQream:
Run the following:
import pysqream from datetime import date, datetime from time import time con = pysqream.connect(host='127.0.0.1', port=3108, database='master' , username='rhendricks', password='Tr0ub4dor&3' , clustered=True) , cur = con.cursor()
Create a table for loading:
create = 'create or replace table perf (b bool, t tinyint, sm smallint, i int, bi bigint, f real, d double, s varchar(12), ss text, dt date, dtt datetime)' cur.execute(create)
Load your data into table using the
INSERT
command.Create dummy data matching the table you created:
data = (False, 2, 12, 145, 84124234, 3.141, -4.3, "Marty McFly" , u"キウイは楽しい鳥です" , date(2019, 12, 17), datetime(1955, 11, 4, 1, 23, 0, 0)) row_count = 10**4
Get a new cursor:
insert = 'insert into perf values (?,?,?,?,?,?,?,?,?,?,?)' start = time() cur.executemany(insert, [data] * row_count) print (f"Total insert time for {row_count} rows: {time() - start} seconds")
Close this cursor:
cur.close()
Verify that the data was inserted correctly:
cur = con.cursor() cur.execute('select count(*) from perf') result = cur.fetchall() # `fetchall` collects the entire data set print (f"Count of inserted rows: {result[0][0]}")
Close the cursor:
cur.close()
Close the connection:
con.close()
Using SQLAlchemy ORM to Create and Populate Tables
This section shows how to use the ORM to create and populate tables from Python objects.
To use SQLAlchemy ORM to create and populate tables:
Run the following:
import sqlalchemy as sa import pandas as pd engine_url = "sqream://rhendricks:secret_password@localhost:5000/raviga" engine = sa.create_engine(engine_url)
Build a metadata object and bind it:
metadata = sa.MetaData() metadata.bind = engine
Create a table in the local metadata:
employees = sa.Table( 'employees' , metadata , sa.Column('id', sa.Integer) , sa.Column('name', sa.VARCHAR(32)) , sa.Column('lastname', sa.VARCHAR(32)) , sa.Column('salary', sa.Float) )
The
create_all()
function uses the SQream engine object.Create all the defined table objects:
metadata.create_all(engine)
Populate your table.
Build the data rows:
insert_data = [ {'id': 1, 'name': 'Richard','lastname': 'Hendricks', 'salary': 12000.75} ,{'id': 3, 'name': 'Bertram', 'lastname': 'Gilfoyle', 'salary': 8400.0} ,{'id': 8, 'name': 'Donald', 'lastname': 'Dunn', 'salary': 6500.40} ]
Build the
INSERT
command:ins = employees.insert(insert_data)
Execute the command:
result = engine.execute(ins)
For more information, see the python_api_reference_guide.
Connecting to SQream Using Node.JS
The SQream DB Node.JS driver allows Javascript applications and tools connect to SQream DB. This tutorial shows you how to write a Node application using the Node.JS interface.
The driver requires Node 10 or newer.
Installing the Node.JS driver
Prerequisites
Node.JS 10 or newer. Follow instructions at nodejs.org .
Install with NPM
Installing with npm is the easiest and most reliable method. If you need to install the driver in an offline system, see the offline method below.
$ npm install @sqream/sqreamdb
Install from an offline package
The Node driver is provided as a tarball for download from the SQream Drivers page .
After downloading the tarball, use npm
to install the offline package.
$ sudo npm install sqreamdb-4.0.0.tgz
Connect to SQream DB with a Node.JS application
Create a simple test
Replace the connection parameters with real parameters for a SQream DB installation.
const Connection = require('@sqream/sqreamdb');
const config = {
host: 'localhost',
port: 3109,
username: 'rhendricks',
password: 'super_secret_password',
connectDatabase: 'raviga',
cluster: true,
is_ssl: true,
service: 'sqream'
};
const query1 = 'SELECT 1 AS test, 2*6 AS "dozen"';
const sqream = new Connection(config);
sqream.execute(query1).then((data) => {
console.log(data);
}, (err) => {
console.error(err);
});
Run the test
A successful run should look like this:
$ node sqreamdb-test.js
[ { test: 1, dozen: 12 } ]
API reference
Connection parameters
Item |
Optional |
Default |
Description |
---|---|---|---|
|
✗ |
None |
Hostname for SQream DB worker. For example, |
|
✗ |
None |
Port for SQream DB end-point. For example, |
|
✗ |
None |
Username of a role to use for connection. For example, |
|
✗ |
None |
Specifies the password of the selected role. For example, |
|
✗ |
None |
Database name to connect to. For example, |
|
✓ |
|
Specifices service queue to use. For example, |
|
✓ |
|
Specifies SSL for this connection. For example, |
|
✓ |
|
Connect via load balancer (use only if exists, and check port). For example, |
Events
The connector handles event returns with an event emitter
- getConnectionId
The
getConnectionId
event returns the executing connection ID.- getStatementId
The
getStatementId
event returns the executing statement ID.- getTypes
The
getTypes
event returns the results columns types.
Example
const myConnection = new Connection(config);
myConnection.runQuery(query1, function (err, data){
myConnection.events.on('getConnectionId', function(data){
console.log('getConnectionId', data);
});
myConnection.events.on('getStatementId', function(data){
console.log('getStatementId', data);
});
myConnection.events.on('getTypes', function(data){
console.log('getTypes', data);
});
});
Input placeholders
The Node.JS driver can replace parameters in a statement.
Input placeholders allow values like user input to be passed as parameters into queries, with proper escaping.
The valid placeholder formats are provided in the table below.
Placeholder |
Type |
---|---|
|
Identifier (e.g. table name, column name) |
|
A text string |
|
A number value |
|
A boolean value |
See the input placeholders example below.
Examples
Setting configuration flags
SQream DB configuration flags can be set per statement, as a parameter to runQuery
.
For example:
const setFlag = 'SET showfullexceptioninfo = true;';
const query_string = 'SELECT 1';
const myConnection = new Connection(config);
myConnection.runQuery(query_string, function (err, data){
console.log(err, data);
}, setFlag);
Lazyloading
To process rows without keeping them in memory, you can lazyload the rows with an async:
const Connection = require('@sqream/sqreamdb');
const config = {
host: 'localhost',
port: 3109,
username: 'rhendricks',
password: 'super_secret_password',
connectDatabase: 'raviga',
cluster: true,
is_ssl: true,
service: 'sqream'
};
const sqream = new Connection(config);
const query = "SELECT * FROM public.a_very_large_table";
(async () => {
const cursor = await sqream.executeCursor(query);
let count = 0;
for await (let rows of cursor.fetchIterator(100)) {
// fetch rows in chunks of 100
count += rows.length;
}
await cursor.close();
return count;
})().then((total) => {
console.log('Total rows', total);
}, (err) => {
console.error(err);
});
Reusing a connection
It is possible to execeute multiple queries with the same connection (although only one query can be executed at a time).
const Connection = require('@sqream/sqreamdb');
const config = {
host: 'localhost',
port: 3109,
username: 'rhendricks',
password: 'super_secret_password',
connectDatabase: 'raviga',
cluster: true,
is_ssl: true,
service: 'sqream'
};
const sqream = new Connection(config);
(async () => {
const conn = await sqream.connect();
try {
const res1 = await conn.execute("SELECT 1");
const res2 = await conn.execute("SELECT 2");
const res3 = await conn.execute("SELECT 3");
conn.disconnect();
return {res1, res2, res3};
} catch (err) {
conn.disconnect();
throw err;
}
})().then((res) => {
console.log('Results', res)
}, (err) => {
console.error(err);
});
Using placeholders in queries
Input placeholders allow values like user input to be passed as parameters into queries, with proper escaping.
const Connection = require('@sqream/sqreamdb');
const config = {
host: 'localhost',
port: 3109,
username: 'rhendricks',
password: 'super_secret_password',
connectDatabase: 'raviga',
cluster: true,
is_ssl: true,
service: 'sqream'
};
const sqream = new Connection(config);
const sql = "SELECT %i FROM public.%i WHERE name = %s AND num > %d AND active = %b";
sqream.execute(sql, "col1", "table2", "john's", 50, true);
The query that will run is SELECT col1 FROM public.table2 WHERE name = 'john''s' AND num > 50 AND active = true
Troubleshooting and recommended configuration
Preventing heap out of memory
errors
Some workloads may cause Node.JS to fail with the error:
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
To prevent this error, modify the heap size configuration by setting the --max-old-space-size
run flag.
For example, set the space size to 2GB:
$ node --max-old-space-size=2048 my-application.js
BIGINT support
The Node.JS connector supports fetching BIGINT
values from SQream DB. However, some applications may encounter an error when trying to serialize those values.
The error that appears is: .. code-block:: none
TypeError: Do not know how to serialize a BigInt
This is because JSON specification do not support BIGINT values, even when supported by Javascript engines.
To resolve this issue, objects with BIGINT values should be converted to string before serializing, and converted back after deserializing.
For example:
const rows = [{test: 1n}]
const json = JSON.stringify(rows, , (key, value) =>
typeof value === 'bigint'
? value.toString()
: value // return everything else unchanged
));
console.log(json); // [{"test": "1"}]
ODBC
Install and Configure ODBC on Windows
The ODBC driver for Windows is provided as a self-contained installer.
This tutorial shows you how to install and configure ODBC on Windows.
Installing the ODBC Driver
Prerequisites
Visual Studio 2015 Redistributables
To install the ODBC driver you must first install Microsoft’s Visual C++ Redistributable for Visual Studio 2015. To install Visual C++ Redistributable for Visual Studio 2015, see the Install Instructions.
Administrator Privileges
The SQream DB ODBC driver requires administrator privileges on your computer to add the DSNs (data source names).
1. Run the Windows installer
Install the driver by following the on-screen instructions in the easy-to-follow installer.

Note
The installer will install the driver in C:\Program Files\SQream Technologies\ODBC Driver
by default. This path is changable during the installation.
2. Selecting Components
The installer includes additional components, like JDBC and Tableau customizations.

You can deselect items you don’t want to install, but the items named ODBC Driver DLL and ODBC Driver Registry Keys must remain selected for a complete installation of the ODBC driver.
Once the installer finishes, you will be ready to configure the DSN for connection.
3. Configuring the ODBC Driver DSN
ODBC driver configurations are done via DSNs. Each DSN represents one SQream DB database.
Open up the Windows menu by clicking the Windows button on your keyboard (⊞ Win) or pressing the Windows button with your mouse.
Type ODBC and select ODBC Data Sources (64-bit). Click the item to open up the setup window.
The installer has created a sample User DSN named SQreamDB
You can modify this DSN, or create a new one (
)Enter your connection parameters. See the reference below for a description of the parameters.
When completed, save the DSN by selecting
Tip
Test the connection by clicking
before saving. A successful test looks like this:
You can now use this DSN in ODBC applications like Tableau.
Connection Parameters
Item |
Description |
---|---|
Data Source Name |
An easily recognizable name that you’ll use to reference this DSN. Once you set this, it can not be changed. |
Description |
A description of this DSN for your convenience. You can leave this blank. |
User |
Username of a role to use for connection. For example, |
Password |
Specifies the password of the selected role. For example, |
Database |
Specifies the database name to connect to. For example, |
Service |
Specifices service queue to use. For example, |
Server |
Hostname of the SQream DB worker. For example, |
Port |
TCP port of the SQream DB worker. For example, |
User server picker |
Connect via load balancer (use only if exists, and check port) |
SSL |
Specifies SSL for this connection |
Logging options |
Use this screen to alter logging options when tracing the ODBC connection for possible connection issues. |
Troubleshooting
Solving “Code 126” ODBC errors
After installing the ODBC driver, you may experience the following error:
The setup routines for the SQreamDriver64 ODBC driver could not be loaded due to system error
code 126: The specified module could not be found.
(c:\Program Files\SQream Technologies\ODBC Driver\sqreamOdbc64.dll)
This is an issue with the Visual Studio Redistributable packages. Verify you’ve correctly installed them, as described in the Visual Studio 2015 Redistributables section above.
Install and configure ODBC on Linux
The ODBC driver for Windows is provided as a shared library.
This tutorial shows how to install and configure ODBC on Linux.
Prerequisites
unixODBC
The ODBC driver requires a driver manager to manage the DSNs. SQream DB’s driver is built for unixODBC.
Verify unixODBC is installed by running:
$ odbcinst -j
unixODBC 2.3.4
DRIVERS............: /etc/odbcinst.ini
SYSTEM DATA SOURCES: /etc/odbc.ini
FILE DATA SOURCES..: /etc/ODBCDataSources
USER DATA SOURCES..: /home/rhendricks/.odbc.ini
SQLULEN Size.......: 8
SQLLEN Size........: 8
SQLSETPOSIROW Size.: 8
Take note of the location of .odbc.ini
and .odbcinst.ini
. In this case, /etc
. If odbcinst
is not installed, follow the instructions for your platform below:
Install unixODBC on RHEL 7 / CentOS 7
$ yum install -y unixODBC unixODBC-devel
Install unixODBC on Ubuntu
$ sudo apt-get install unixodbc unixodbc-dev
Install the ODBC driver with a script
Use this method if you have never used ODBC on your machine before. If you have existing DSNs, see the manual install process below.
Unpack the tarball Copy the downloaded file to any directory, and untar it to a new directory:
$ mkdir -p sqream_odbc64 $ tar xf sqream_2019.2.1_odbc_3.0.0_x86_64_linux.tar.gz -C sqream_odbc64
Run the first-time installer. The installer will create an editable DSN.
$ cd sqream_odbc64 ./odbc_install.sh --install
Edit the DSN created by editing
/etc/.odbc.ini
. See the parameter explanation in the section ODBC DSN Parameters.
Install the ODBC driver manually
Use this method when you have existing ODBC DSNs on your machine.
Unpack the tarball Copy the file you downloaded to the directory where you want to install it, and untar it:
$ tar xf sqream_2019.2.1_odbc_3.0.0_x86_64_linux.tar.gz -C sqream_odbc64
Take note of the directory where the driver was unpacked. For example,
/home/rhendricks/sqream_odbc64
Locate the
.odbc.ini
and.odbcinst.ini
files, usingodbcinst -j
.In
.odbcinst.ini
, add the following lines to register the driver (change the highlighted paths to match your specific driver):[ODBC Drivers] SqreamODBCDriver=Installed [SqreamODBCDriver] Description=Driver DSII SqreamODBC 64bit Driver=/home/rhendricks/sqream_odbc64/sqream_odbc64.so Setup=/home/rhendricks/sqream_odbc64/sqream_odbc64.so APILevel=1 ConnectFunctions=YYY DriverODBCVer=03.80 SQLLevel=1 IconvEncoding=UCS-4LE
In
.odbc.ini
, add the following lines to configure the DSN (change the highlighted parameters to match your installation):[ODBC Data Sources] MyTest=SqreamODBCDriver [MyTest] Description=64-bit Sqream ODBC Driver=/home/rhendricks/sqream_odbc64/sqream_odbc64.so Server="127.0.0.1" Port="5000" Database="raviga" Service="" User="rhendricks" Password="Tr0ub4dor&3" Cluster=false Ssl=false
Parameters are in the form of
parameter = value
. For details about the parameters that can be set for each DSN, see the section ODBC DSN Parameters.Create a file called
.sqream_odbc.ini
for managing the driver settings and logging. This file should be created alongside the other files, and add the following lines (change the highlighted parameters to match your installation):# Note that this default DriverManagerEncoding of UTF-32 is for iODBC. unixODBC uses UTF-16 by default. # If unixODBC was compiled with -DSQL_WCHART_CONVERT, then UTF-32 is the correct value. # Execute 'odbc_config --cflags' to determine if you need UTF-32 or UTF-16 on unixODBC [Driver] DriverManagerEncoding=UTF-16 DriverLocale=en-US ErrorMessagesPath=/home/rhendricks/sqream_odbc64/ErrorMessages LogLevel=0 LogNamespace= LogPath=/tmp/ ODBCInstLib=libodbcinst.so
Install the driver dependencies
Add the ODBC driver path to LD_LIBRARY_PATH
:
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/rhendricks/sqream_odbc64/lib
You can also add this previous command line to your ~/.bashrc
file in order to keep this installation working between reboots without re-entering the command manually
Testing the connection
Test the driver using isql
.
If the DSN created is called MyTest
as the example, run isql in this format:
$ isql MyTest
ODBC DSN Parameters
Item |
Default |
Description |
|||
---|---|---|---|---|---|
Data Source Name |
None |
An easily recognizable name that you’ll use to reference this DSN. |
|||
Description |
None |
A description of this DSN for your convenience. This field can be left blank |
|||
User |
None |
Username of a role to use for connection. For example, |
|||
Password |
None |
Specifies the password of the selected role. For example, |
|||
Database |
None |
Specifies the database name to connect to. For example, |
|||
Service |
|
Specifices service queue to use. For example, |
|||
Server |
None |
Hostname of the SQream DB worker. For example, |
|||
Port |
None |
TCP port of the SQream DB worker. For example, |
|||
Cluster |
|
Connect via load balancer (use only if exists, and check port). For example, |
|||
Ssl |
|
Specifies SSL for this connection. For example, |
|||
DriverManagerEncoding |
|
Depending on how unixODBC is installed, you may need to change this to |
|||
ErrorMessagesPath |
None |
Location where the driver was installed. For example, |
|||
LogLevel |
0 |
Set to 0-6 for logging. Use this setting when instructed to by SQream Support. For example,
|
SQream has an ODBC driver to connect to SQream DB. This tutorial shows how to install the ODBC driver for Linux or Windows for use with applications like Tableau, PHP, and others that use ODBC.
Platform |
Versions supported |
---|---|
Windows |
|
Linux |
|
Other distributions may also work, but are not officially supported by SQream.
Getting the ODBC driver
The SQream ODBC driver is distributed by your SQream account manager. Before contacting your account manager, verify which platform the ODBC driver will be used on. Go to SQream Support or contact your SQream account manager to get the driver.
The driver is provided as an executable installer for Windows, or a compressed tarball for Linux platforms. After downloading the driver, follow the relevant instructions to install and configure the driver for your platform:
Install and configure the ODBC driver
Continue based on your platform:
Connecting to SQream Using .NET
The SqreamNet ADO.NET Data Provider lets you connect to SQream through your .NET environment.
The .NET page includes the following sections:
Integrating SQreamNet
The Integrating SQreamNet section describes the following:
Prerequisites
The SqreamNet provider requires a .NET version 6 or newer.
Getting the DLL file
The .NET driver is available for download from the client drivers download page.
Integrating SQreamNet
After downloading the .NET driver, save the archive file to a known location. Next, in your IDE, add a Sqreamnet.dll reference to your project.
If you wish to upgrade SQreamNet within an existing project, you may replace the existing .dll file with an updated one or change the project’s reference location to a new one.
Known Driver Limitations
Unicode characters are not supported when using
INSERT INTO AS SELECT
.To avoid possible casting issues, use
getDouble
when usingFLOAT
.
Connecting to SQream For the First Time
An initial connection to SQream must be established by creating a SqreamConnection object using a connection string.
Connection String
To connect to SQream, instantiate a SqreamConnection object using this connection string.
The following is the syntax for SQream:
"Data Source=<hostname or ip>,<port>;User=<username>;Password=<password>;Initial Catalog=<database name>;Integrated Security=true";
Connection Parameters
Item |
State |
Default |
Description |
---|---|---|---|
|
Mandatory |
None |
Hostname/IP/FQDN and port of the SQream DB worker. For example, |
|
Mandatory |
None |
Database name to connect to. For example, |
|
Mandatory |
None |
Username of a role to use for connection. For example, |
|
Mandatory |
None |
Specifies the password of the selected role. For example, |
|
Optional |
|
Specifices service queue to use. For example, |
|
Optional |
|
Specifies SSL for this connection. For example, |
|
Optional |
|
Connect via load balancer (use only if exists, and check port). |
Connection String Examples
The following is an example of a SQream cluster with load balancer and no service queues (with SSL):
Data Source=sqream.mynetwork.co,3108;User=rhendricks;Password=Tr0ub4dor&3;Initial Catalog=master;Integrated Security=true;ssl=true;cluster=true;
The following is a minimal example for a local standalone SQream database:
Data Source=127.0.0.1,5000;User=rhendricks;Password=Tr0ub4dor&3;Initial Catalog=master;
The following is an example of a SQream cluster with load balancer and a specific service queue named etl
, to the database named raviga
Data Source=sqream.mynetwork.co,3108;User=rhendricks;Password=Tr0ub4dor&3;Initial Catalog=raviga;Integrated Security=true;service=etl;cluster=true;
Sample C# Program
You can download the .NET Application Sample File
below by right-clicking and saving it to your computer.
1 public void Test()
2 {
3 var connection = OpenConnection("192.168.4.62", 5000, "sqream", "sqream", "master");
4
5 ExecuteSQLCommand(connection, "create or replace table tbl_example as select 1 as x , 'a' as y;");
6
7 var tableData = ReadExampleData(connection, "select * from tbl_example;");
8 }
9
10 /// <summary>
11 /// Builds a connection string to sqream server and opens a connection
12 /// </summary>
13 /// <param name="ipAddress">host to connect</param>
14 /// <param name="port">port sqreamd is running on</param>
15 /// <param name="username">role username</param>
16 /// <param name="password">role password</param>
17 /// <param name="databaseName">database name</param>
18 /// <param name="isCluster">optional - set to true when the ip,port endpoint is a server picker process</param>
19 /// <returns>
20 /// SQream connection object
21 /// Throws SqreamException if fails to open a connction
22 /// </returns>
23 public SqreamConnection OpenConnection(string ipAddress, int port, string username, string password, string databaseName, bool isCluster = false)
24 {
25 // create the connection string according to the format
26 var connectionString = string.Format(
27 "Data Source={0},{1};User={2};Password={3};Initial Catalog={4};Cluster={5}",
28 ipAddress,
29 port,
30 username,
31 password,
32 databaseName,
33 isCluster
34 );
35
36 // create a sqeram connection object
37 var connection = new SqreamConnection(connectionString);
38
39 // open a connection
40 connection.Open();
41
42 // returns the connection object
43 return connection;
44 }
45
46 /// <summary>
47 /// Executes a SQL command to sqream server
48 /// </summary>
49 /// <param name="connection">connection to sqream server</param>
50 /// <param name="sql">sql command</param>
51 /// <exception cref="InvalidOperationException"> thrown when the connection is not open</exception>
52 public void ExecuteSQLCommand(SqreamConnection connection, string sql)
53 {
54 // validates the connection is open and throws exception if not
55 if (connection.State != System.Data.ConnectionState.Open)
56 throw new InvalidOperationException(string.Format("connection to sqream is not open. connection.State: {0}", connection.State));
57
58 // creates a new command object utilizing the sql and the connection
59 var command = new SqreamCommand(sql, connection);
60
61 // executes the command
62 command.ExecuteNonQuery();
63 }
64
65 /// <summary>
66 /// Executes a SQL command to sqream server, and reads the result set usiing DataReader
67 /// </summary>
68 /// <param name="connection">connection to sqream server</param>
69 /// <param name="sql">sql command</param>
70 /// <exception cref="InvalidOperationException"> thrown when the connection is not open</exception>
71 public List<Tuple<int, string>> ReadExampleData(SqreamConnection connection, string sql)
72 {
73 // validates the connection is open and throws exception if not
74 if (connection.State != System.Data.ConnectionState.Open)
75 throw new InvalidOperationException(string.Format("connection to sqream is not open. connection.State: {0}", connection.State));
76
77 // creates a new command object utilizing the sql and the connection
78 var command = new SqreamCommand(sql, connection);
79
80 // creates a reader object to iterate over the result set
81 var reader = (SqreamDataReader)command.ExecuteReader();
82
83 // list of results
84 var result = new List<Tuple<int, string>>();
85
86 //iterate the reader and read the table int,string values into a result tuple object
87 while (reader.Read())
88 result.Add(new Tuple<int, string>(reader.GetInt32(0), reader.GetString(1)));
89
90 // return the result set
91 return result;
92 }
Need help?
If you couldn’t find what you’re looking for, contact SQream Support
Looking for older drivers?
If you’re looking for an older version of SQreamDB drivers, visit here.
If you need a tool that SQream does not support, contact SQream Support or your SQream account manager for more information.
External Storage Platforms
SQream supports the following external storage platforms:
Google Cloud Platform
Ingesting data using Google Cloud Platform (GCP) requires configuring Google Cloud Storage (GCS) bucket access. You may configure SQreamDB to separate source and destination by granting read access to one bucket and write access to a different bucket. Such separation requires that each bucket be individually configured.
Google Cloud Platform URI Format
Specify a location for a file (or files) when using COPY FROM or Foreign Tables.
The following is an example of the general GCP syntax:
gs://<gcs path>/<gcs_bucket>/
Granting GCP access
Before You Begin
It is essential that you have a GCP service account string.
String example:
sample_service_account@sample_project.iam.gserviceaccount.com
In your Google Cloud console, go to Select a project and select the desired project.
From the PRODUCTS menu, select Cloud Storage > Buckets.
Select the bucket you wish to configure; or create a new bucket by selecting CREATE and following the Create a bucket procedure, and select the newly created bucket.
Select UPLOAD FILES and upload the data files you wish SQreamDB to ingest.
Go to PERMISSIONS and select GRANT ACCESS.
Under Add principals, in the New principals box, paste your service account string.
Under Assign roles, in the Select a role box, select Storage Admin.
Select ADD ANOTHER ROLE and in the newly created Select a role box, select Storage Object Admin.
Select SAVE.
Note
Optimize access time to your data by configuring the location of your bucket according to Google Cloud location considerations.
Example
COPY table_name FROM WRAPPER csv_fdw OPTIONS(location = 'gs://mybucket/sqream-demo-data/file.csv');
HDFS Environment
Configuring an HDFS Environment for the User sqream
This section describes how to configure an HDFS environment for the user sqream and is only relevant for users with an HDFS environment.
To configure an HDFS environment for the user sqream:
Open your bash_profile configuration file for editing:
$ vim /home/sqream/.bash_profile
Verify that the edits have been made:
source /home/sqream/.bash_profile
Check if you can access Hadoop from your machine:
$ hadoop fs -ls hdfs://<hadoop server name or ip>:8020/
Verify that an HDFS environment exists for SQream services:
$ ls -l /etc/sqream/sqream_env.sh
If an HDFS environment does not exist for SQream services, create one (sqream_env.sh):
$ #!/bin/bash $ SQREAM_HOME=/usr/local/sqream $ export SQREAM_HOME $ export JAVA_HOME=${SQREAM_HOME}/hdfs/jdk $ export HADOOP_INSTALL=${SQREAM_HOME}/hdfs/hadoop $ export CLASSPATH=`${HADOOP_INSTALL}/bin/hadoop classpath --glob` $ export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_INSTALL}/lib/native $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SQREAM_HOME}/lib:$HADOOP_COMMON_LIB_NATIVE_DIR $ PATH=$PATH:$HOME/.local/bin:$HOME/bin:${SQREAM_HOME}/bin/:${JAVA_HOME}/bin:$HADOOP_INSTALL/bin $ export PATH
Authenticating Hadoop Servers that Require Kerberos
If your Hadoop server requires Kerberos authentication, do the following:
Create a principal for the user sqream.
$ kadmin -p root/admin@SQ.COM $ addprinc sqream@SQ.COM
If you do not know yor Kerberos root credentials, connect to the Kerberos server as a root user with ssh and run kadmin.local:
$ kadmin.local
Running kadmin.local does not require a password.
If a password is not required, change your password to sqream@SQ.COM.
$ change_password sqream@SQ.COM
Connect to the hadoop name node using ssh:
$ cd /var/run/cloudera-scm-agent/process
Check the most recently modified content of the directory above:
$ ls -lrt
Look for a recently updated folder containing the text hdfs.
The following is an example of the correct folder name:
cd <number>-hdfs-<something>
This folder should contain a file named hdfs.keytab or another similar .keytab file.
Copy the .keytab file to user sqream’s Home directory on the remote machines that you are planning to use Hadoop on.
Copy the following files to the sqream sqream@server:<sqream folder>/hdfs/hadoop/etc/hadoop: directory:
core-site.xml
hdfs-site.xml
Connect to the sqream server and verify that the .keytab file’s owner is a user sqream and is granted the correct permissions:
$ sudo chown sqream:sqream /home/sqream/hdfs.keytab $ sudo chmod 600 /home/sqream/hdfs.keytab
Log into the sqream server.
Log in as the user sqream.
Navigate to the Home directory and check the name of a Kerberos principal represented by the following .keytab file:
$ klist -kt hdfs.keytabThe following is an example of the correct output:
$ sqream@Host-121 ~ $ klist -kt hdfs.keytab $ Keytab name: FILE:hdfs.keytab $ KVNO Timestamp Principal $ ---- ------------------- ------------------------------------------------------ $ 5 09/15/2020 18:03:05 HTTP/nn1@SQ.COM $ 5 09/15/2020 18:03:05 HTTP/nn1@SQ.COM $ 5 09/15/2020 18:03:05 HTTP/nn1@SQ.COM $ 5 09/15/2020 18:03:05 HTTP/nn1@SQ.COM $ 5 09/15/2020 18:03:05 HTTP/nn1@SQ.COM $ 5 09/15/2020 18:03:05 HTTP/nn1@SQ.COM $ 5 09/15/2020 18:03:05 HTTP/nn1@SQ.COM $ 5 09/15/2020 18:03:05 HTTP/nn1@SQ.COM $ 5 09/15/2020 18:03:05 hdfs/nn1@SQ.COM $ 5 09/15/2020 18:03:05 hdfs/nn1@SQ.COM $ 5 09/15/2020 18:03:05 hdfs/nn1@SQ.COM $ 5 09/15/2020 18:03:05 hdfs/nn1@SQ.COM $ 5 09/15/2020 18:03:05 hdfs/nn1@SQ.COM $ 5 09/15/2020 18:03:05 hdfs/nn1@SQ.COM $ 5 09/15/2020 18:03:05 hdfs/nn1@SQ.COM $ 5 09/15/2020 18:03:05 hdfs/nn1@SQ.COM
Verify that the hdfs service named hdfs/nn1@SQ.COM is shown in the generated output above.
Run the following:
$ kinit -kt hdfs.keytab hdfs/nn1@SQ.COM
Check the output:
$ klist
The following is an example of the correct output:
$ Ticket cache: FILE:/tmp/krb5cc_1000 $ Default principal: sqream@SQ.COM $ $ Valid starting Expires Service principal $ 09/16/2020 13:44:18 09/17/2020 13:44:18 krbtgt/SQ.COM@SQ.COM
List the files located at the defined server name or IP address:
$ hadoop fs -ls hdfs://<hadoop server name or ip>:8020/
Do one of the following:
If the list below is output, continue with Step 16.
If the list is not output, verify that your environment has been set up correctly.
If any of the following are empty, verify that you followed Step 6 in the Configuring an HDFS Environment for the User sqream section above correctly:
$ echo $JAVA_HOME $ echo $SQREAM_HOME $ echo $CLASSPATH $ echo $HADOOP_COMMON_LIB_NATIVE_DIR $ echo $LD_LIBRARY_PATH $ echo $PATH
Verify that you copied the correct keytab file.
Review this procedure to verify that you have followed each step.
Amazon Web Services
SQreamDB uses a native Amazon Simple Storage Services (S3) connector for inserting data. The s3://
URI specifies an external file path to an S3 bucket. File names may contain wildcard characters, and the files can be in CSV or columnar format, such as Parquet and ORC.
S3 URI Format
With S3, specify a location for a file (or files) when using COPY FROM or Foreign Tables.
The following is an example of the general S3 syntax:
s3://bucket_name/path
Granting Access to S3
A best practice for granting access to AWS S3 is by creating an Identity and Access Management (IAM) user account. If creating an IAM user account is not possible, you may follow AWS guidelines for using the global configuration object and setting an AWS region.
Connecting to S3 Using SQreamDB Legacy Configuration File
You may use the following parameters within your SQreamDB legacy configuration file:
Parameter |
Description |
Parameter Value |
Example |
---|---|---|---|
|
Overrides the AWS S3 HTTP endpoint when using Virtual Private Cloud (VPC) |
|
sqream_config_legacy.json:
{
...,
"AwsEndpointOverride": "https://my.endpoint.local"
}
|
|
Enables configuration of S3 object access styles, which determine how you can access and interact with the objects stored in an S3 bucket |
|
sqream_config_legacy.json:
{
...,
"AwsObjectAccessStyle": "path"
}
|
Authentication
SQreamDB supports AWS ID
and AWS SECRET
authentication. These should be specified when executing a statement.
Examples
Use a foreign table to stage data from S3 before loading from CSV, Parquet, or ORC files.
Creating a Foreign Table
Based on the source file’s structure, you can create a foreign table with the appropriate structure, and point it to your file as shown in the following example:
CREATE FOREIGN TABLE nba
(
Name text(40),
Team text(40),
Number tinyint,
Position text(2),
Age tinyint,
Height text(4),
Weight real,
College text(40),
Salary float
)
WRAPPER csv_fdw
OPTIONS
(
LOCATION = 's3://sqream-demo-data/nba_players.csv',
RECORD_DELIMITER = '\r\n' -- DOS delimited file
)
;
In the example above the file format is CSV, and it is stored as an S3 object. If the path is on HDFS, you must change the URI accordingly. Note that the record delimiter is a DOS newline (\r\n
).
For more information, see the following:
Querying Foreign Tables
The following shows the data in the foreign table:
t=> SELECT * FROM nba LIMIT 10;
name | team | number | position | age | height | weight | college | salary
--------------+----------------+--------+----------+-----+--------+--------+-------------------+---------
Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas | 7730337
Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette | 6796117
John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston University |
R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia State | 1148640
Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | 231 | | 5000000
Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | 240 | | 12000000
Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | 235 | LSU | 1170960
Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | 238 | Gonzaga | 2165160
Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | 190 | Louisville | 1824360
Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | 220 | Oklahoma State | 3431040
Bulk Loading a File from a Public S3 Bucket
The COPY FROM
command can also be used to load data without staging it first.
The bucket must be publicly available and objects must be listed.
COPY nba FROM 's3://sqream-demo-data/nba.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n';
Loading Files from an Authenticated S3 Bucket
COPY nba FROM 's3://secret-bucket/*.csv' WITH OFFSET 2 RECORD DELIMITER '\r\n'
AWS_ID '12345678'
AWS_SECRET 'super_secretive_secret';
For more information, see the following:
Loading and Unloading Data
The Loading Data section describes concepts and operations related to importing data into your SQream database:
Overview of loading data - Describes best practices and considerations for loading data into SQream from a variety of sources and locations.
Alternatives to loading data (foreign tables) - Useful for running queries directly on external data without importing into your SQream database.
Supported data types - Overview of supported data types, including descriptions, examples, and relevant aliases.
Ingesting data from external sources - List of data ingestion sources that SQream supports.
Inserting data from external tables - Inserts one or more rows into a table.
Ingesting data from third party client platforms - Gives you direct access to a variety of drivers, connectors, tools, vizualisers, and utilities..
Using the COPY FROM statement - Used for loading data from files located on a filesystem into SQream tables.
Importing data using Studio - SQream’s web-based client providing users with all functionality available from the command line in an intuitive and easy-to-use format.
Loading data using Amazon S3 - Used for loading data from Amazon S3.
Troubleshooting - Describes troubleshooting solutions related to importing data from the following:
The Unloading Data section describes concepts and operations related to exporting data from your SQream database:
Overview of unloading data - Describes best practices and considerations for unloading data from SQream to a variety of sources and locations.
The COPY TO statement - Used for unloading data from a SQream database table or query to a file on a filesystem.
Feature Guides
The Feature Guides section describes background processes that SQream uses to manage several areas of operation, such as data ingestion, load balancing, and access control.
This section describes the following features:
Automatic Foreign Table DDL Resolution
The Automatic Foreign Table DDL Resolution page describes the following:
Overview
SQream must be able to access a schema when reading and mapping external files to a foreign table. To facilitate this, you must specify the correct schema in the statement that creates the foreign table, which must also include the correct list of columns. To avoid human error related to this complex process SQream can now automatically identify the corresponding schema, saving you the time and effort required to build your schema manually. This is especially useful for particular file formats, such as Parquet, which include a built-in schema declaration.
Usage Notes
The automatic foreign table DDL resolution feature supports Parquet, ORC, JSON, and Avro files, while using it with CSV files generates an error. You can activate this feature when you create a foreign table by omitting the column list, described in the Syntax section below.
Using this feature the path you specify in the LOCATION
option must point to at least one existing file. If no files exist for the schema to read, an error will be generated. You can specify the schema manually even in the event of the error above.
Note
When using this feature, SQream assumes that all files in the path use the same schema.
Syntax
The following is the syntax for using the automatic foreign table DDL resolution feature:
CREATE FOREIGN TABLE table_name
[FOREIGN DATA] WRAPPER fdw_name
[OPTIONS (...)];
Example
The following is an example of using the automatic foreign table DDL resolution feature:
create foreign table parquet_table
wrapper parquet_fdw
options (location = '/tmp/file.parquet');
Permissions
The automatic foreign table DDL resolution feature requires Read permissions.
Query Healer
The Query Healer page describes the following:
Overview
The Query Healer periodically examines the progress of running statements, creating a log entry for all statements exceeding a defined time period.
Configuration
The following Administration Worker flags are required to configure the Query Healer:
Flag |
Description |
---|---|
|
The Is Healer On enables and disables the Query Healer. |
|
The healer_max_statement_inactivity_seconds worker level flag defines the threshold for creating a log recording a slow statement. The log includes information about the log memory, CPU and GPU. The default setting is five hours. |
|
The Healer Detection Frequency Seconds worker level flag triggers the healer to examine the progress of running statements. The default setting is one hour. |
Query Log
The following is an example of a log record for a query stuck in the query detection phase for more than five hours:
|INFO|0x00007f9a497fe700:Healer|192.168.4.65|5001|-1|master|sqream|-1|sqream|0|"[ERROR]|cpp/SqrmRT/healer.cpp:140 |"Stuck query found. Statement ID: 72, Last chunk producer updated: 1.
Once you identify the stuck worker, you can execute the shutdown_server
utility function from this specific worker, as described in the next section.
Activating a Graceful Shutdown
You can activate a graceful shutdown if your log entry says Stuck query found
, as shown in the example above. You can do this by setting the shutdown_server utility function to select shutdown_server();
.
To activte a graceful shutdown:
Locate the IP and the Port of the stuck worker from the logs.
Note
The log in the previous section identifies the IP (192.168.4.65) and port (5001) referring to the stuck query.
From the machine of the stuck query (IP: 192.168.4.65, port: 5001), connect to SQream SQL client:
./sqream sql --port=$STUCK_WORKER_IP --username=$SQREAM_USER --password=$SQREAM_PASSWORD databasename=$SQREAM_DATABASE
Execute
shutdown_server
.
For more information, see the following:
Activating the SHUTDOWN SERVER utility function. This page describes all of
shutdown_server
options.Configuring the shutdown_server flag.
Data Encryption
The Data Encryption page describes the following:
Overview
Data Encryption helps protect sensitive data at rest by concealing it from unauthorized users in the event of a breach. This is achieved by scrambling the content into an unreadable format based on encryption and decryption keys. Typically speaking, this data pertains to PII (Personally Identifiable Information), which is sensitive information such as credit card numbers and other information related to an identifiable person.
Users encrypt their data on a column basis by specifying column_name
in the encryption syntax.
The demand for confidentiality has steadily increased to protect the growing volumes of private data stored on computer systems and transmitted over the internet. To this end, regulatory bodies such as the General Data Protection Regulation (GDPR) have produced requirements to standardize and enforce compliance aimed at protecting customer data.
Encryption can be used for the following:
Creating tables with up to three encrypted columns.
Joining encrypted columns with other tables.
Selecting data from an encrypted column.
Warning
The SELECT
statement decrypts information by default. When executing CREATE TABLE AS SELECT
or INSERT INTO TABLE AS SELECT
, encrypted information will appear as clear text in the newly created table.
For more information on the encryption syntax, see Syntax.
For more information on GDPR compliance requirements, see the GDPR checklist.
Encryption Methods
Data exists in one of following states and determines the encryption method:
Encrypting Data in Transit
Data in transit refers to data you use on a regular basis, usually stored on a database and accessed through applications or programs. This data is typically transferred between several physical or remote locations through email or uploading documents to the cloud. This type of data must therefore be protected while in transit. SQream encrypts data in transit using SSL when, for example, users insert data files from external repositories over a JDBC or ODBC connection.
For more information, see Use TLS/SSL When Possible.
Encrypting Data at Rest
Data at rest refers to data stored on your hard drive or on the cloud. Because this data can be potentially intercepted physically, it requires a form of encryption that protects your data wherever you store it. SQream faciliates encryption by letting you encrypt any columns located in your database that you want to keep private.
Data Types
Typically speaking, sensitive pertains to PII (Personally Identifiable Information), which is sensitive information such as credit card numbers and other information related to an identifiable person.
SQream’s data encryption feature supports encrypting column-based data belonging to the following data types:
INT
BIGINT
TEXT
For more information on the above data types, see Supported Data Types.
Syntax
The following is the syntax for encrypting a new table:
CREATE TABLE <table name> (
<column_name> NOT NULL ENCRYPT,
<column_name> <type_name> ENCRYPT,
<column_name> <type_name>,
<column_name> <type_name> ENCRYPT);
The following is an example of encrypting a new table:
CREATE TABLE client_name (
id BIGINT NOT NULL ENCRYPT,
first_name TEXT ENCRYPT,
last_name TEXT,
salary INT ENCRYPT);
Note
Because encryption is not associated with any role, users with Read or Insert permissions can read tables containing encrypted data.
You cannot encrypt more than three columns. Attempting to encrypt more than three columns displays the following error message:
Error preparing statement: Cannot create a table with more than three encrypted columns.
Permissions
Because the Data Encryption feature does not require a role, users with Read and Insert permissions can read tables containing encrypted data.
Compression
SQreamDB uses a variety of compression and encoding methods to optimize query performance and to save disk space.
Encoding
Encoding is an automatic operation used to convert data into common formats. For example, certain formats are often used for data stored in columnar format, in contrast with data stored in a CSV file, which stores all data in text format.
Encoding enhances performance and reduces data size by using specific data formats and encoding methods. SQream encodes data in a number of ways in accordance with the data type. For example, a date is stored as an integer, starting with March 1st 1CE, which is significantly more efficient than encoding the date as a string. In addition, it offers a wider range than storing it relative to the Unix Epoch.
Lossless Compression
Compression transforms data into a smaller format without sacrificing accuracy, known as lossless compression.
After encoding a set of column values, SQream packs the data and compresses it and decompresses it to make it accessible to users. Depending on the compression scheme used, these operations can be performed on the CPU or the GPU. Some users find that GPU compression provide better performance.
Automatic Compression
By default, SQream automatically compresses every column (see Specifying Compression Strategies below for overriding default compression). This feature is called automatic adaptive compression strategy.
When loading data, SQreamDB automatically decides on the compression schemes for specific chunks of data by trying several compression schemes and selecting the one that performs best. SQreamDB tries to balance more aggressive compression with the time and CPU/GPU time required to compress and decompress the data.
Compression Methods
The following table shows the supported compression methods:
Compression Method |
Supported Data Types |
Description |
Location |
---|---|---|---|
|
All types |
No compression (forced) |
NA |
|
All types |
Automatic scheme selection |
NA |
|
All types |
Dictionary compression with RLE. For each chunk, SQreamDB creates a dictionary of distinct values and stores only their indexes. Works best for integers and texts shorter than 120 characters, with <10% unique values. Useful for storing ENUMs or keys, stock tickers, and dimensions. If the data is optionally sorted, this compression will perform even better. |
GPU |
|
|
Patched frame-of-reference + Delta Based on the delta between consecutive values. Works best for monotonously increasing or decreasing numbers and timestamps |
GPU |
|
Text types |
Lempel-Ziv general purpose compression, used for texts |
CPU |
|
Text types |
General purpose compression, used for texts |
CPU |
|
Integer types, dates and timestamps |
Run-Length Encoding. This replaces sequences of values with a single pair. It is best for low cardinality columns that are used to sort data ( |
GPU |
|
|
Optimized RLE + Delta type for built-in identity columns. |
GPU |
Specifying Compression Strategies
When you create a table without defining any compression specifications, SQream defaults to automatic adaptive compression ("default"
). However, you can prevent this by specifying a compression strategy when creating a table.
This section describes the following compression strategies:
Explicitly Specifying Automatic Compression
When you explicitly specify automatic compression, the following two are equivalent:
CREATE TABLE t (
x INT,
y TEXT(50)
);
In this version, the default compression is specified explicitly:
CREATE TABLE t (
x INT CHECK('CS "default"'),
y TEXT(50) CHECK('CS "default"')
);
Forcing No Compression
Forcing no compression is also known as “flat”, and can be used in the event that you want to remove compression entirely on some columns. This may be useful for reducing CPU or GPU resource utilization at the expense of increased I/O.
The following is an example of removing compression:
CREATE TABLE t (
x INT NOT NULL CHECK('CS "flat"'), -- This column won't be compressed
y TEXT(50) -- This column will still be compressed automatically
);
Forcing Compression
In other cases, you may want to force SQream to use a specific compression scheme based on your knowledge of the data, as shown in the following example:
CREATE TABLE t (
id BIGINT NOT NULL CHECK('CS "sequence"'),
y TEXT(110) CHECK('CS "lz4"'), -- General purpose text compression
z TEXT(80) CHECK('CS "dict"'), -- Low cardinality column
);
However, if SQream finds that the given compression method cannot effectively compress the data, it will return to the default compression type.
Examining Compression Effectiveness
Queries made on the internal metadata catalog can expose how effective the compression is, as well as what compression schemes were selected.
This section describes the following:
Querying the Catalog
The following is a sample query that can be used to query the catalog:
SELECT c.column_name AS "Column",
cc.compression_type AS "Actual compression",
AVG(cc.compressed_size) "Compressed",
AVG(cc.uncompressed_size) "Uncompressed",
AVG(cc.uncompressed_size::FLOAT/ cc.compressed_size) -1 AS "Compression effectiveness",
MIN(c.compression_strategy) AS "Compression strategy"
FROM sqream_catalog.chunk_columns cc
INNER JOIN sqream_catalog.columns c
ON cc.table_id = c.table_id
AND cc.database_name = c.database_name
AND cc.column_id = c.column_id
WHERE c.table_name = 'some_table' -- This is the table name which we want to inspect
GROUP BY 1,
2;
Example Subset from “Ontime” Table
The following is an example (subset) from the ontime
table:
stats=> SELECT c.column_name AS "Column",
. cc.compression_type AS "Actual compression",
. AVG(cc.compressed_size) "Compressed",
. AVG(cc.uncompressed_size) "Uncompressed",
. AVG(cc.uncompressed_size::FLOAT/ cc.compressed_size) -1 AS "Compression effectiveness",
. MIN(c.compression_strategy) AS "Compression strategy"
. FROM sqream_catalog.chunk_columns cc
. INNER JOIN sqream_catalog.columns c
. ON cc.table_id = c.table_id
. AND cc.database_name = c.database_name
. AND cc.column_id = c.column_id
.
. WHERE c.table_name = 'ontime'
.
. GROUP BY 1,
. 2;
Column | Actual compression | Compressed | Uncompressed | Compression effectiveness | Compression strategy
--------------------------+--------------------+------------+--------------+---------------------------+---------------------
actualelapsedtime@null | dict | 129177 | 1032957 | 7 | default
actualelapsedtime@val | dict | 1379797 | 4131831 | 2 | default
airlineid | dict | 578150 | 2065915 | 2.7 | default
airtime@null | dict | 130011 | 1039625 | 7 | default
airtime@null | rle | 93404 | 1019833 | 116575.61 | default
airtime@val | dict | 1142045 | 4131831 | 7.57 | default
arrdel15@null | dict | 129177 | 1032957 | 7 | default
arrdel15@val | dict | 129183 | 4131831 | 30.98 | default
arrdelay@null | dict | 129177 | 1032957 | 7 | default
arrdelay@val | dict | 1389660 | 4131831 | 2 | default
arrdelayminutes@null | dict | 129177 | 1032957 | 7 | default
arrdelayminutes@val | dict | 1356034 | 4131831 | 2.08 | default
arrivaldelaygroups@null | dict | 129177 | 1032957 | 7 | default
arrivaldelaygroups@val | p4d | 516539 | 2065915 | 3 | default
arrtime@null | dict | 129177 | 1032957 | 7 | default
arrtime@val | p4d | 1652799 | 2065915 | 0.25 | default
arrtimeblk | dict | 688870 | 9296621 | 12.49 | default
cancellationcode@null | dict | 129516 | 1035666 | 7 | default
cancellationcode@null | rle | 54392 | 1031646 | 131944.62 | default
cancellationcode@val | dict | 263149 | 1032957 | 4.12 | default
cancelled | dict | 129183 | 4131831 | 30.98 | default
carrier | dict | 578150 | 2065915 | 2.7 | default
carrierdelay@null | dict | 129516 | 1035666 | 7 | default
carrierdelay@null | flat | 1041250 | 1041250 | 0 | default
carrierdelay@null | rle | 4869 | 1026493 | 202740.2 | default
carrierdelay@val | dict | 834559 | 4131831 | 14.57 | default
crsarrtime | p4d | 1652799 | 2065915 | 0.25 | default
crsdeptime | p4d | 1652799 | 2065915 | 0.25 | default
crselapsedtime@null | dict | 130449 | 1043140 | 7 | default
crselapsedtime@null | rle | 3200 | 1013388 | 118975.75 | default
crselapsedtime@val | dict | 1182286 | 4131831 | 2.5 | default
dayofmonth | dict | 688730 | 1032957 | 0.5 | default
dayofweek | dict | 393577 | 1032957 | 1.62 | default
departuredelaygroups@null | dict | 129177 | 1032957 | 7 | default
departuredelaygroups@val | p4d | 516539 | 2065915 | 3 | default
depdel15@null | dict | 129177 | 1032957 | 7 | default
depdel15@val | dict | 129183 | 4131831 | 30.98 | default
depdelay@null | dict | 129177 | 1032957 | 7 | default
depdelay@val | dict | 1384453 | 4131831 | 2.01 | default
depdelayminutes@null | dict | 129177 | 1032957 | 7 | default
depdelayminutes@val | dict | 1362893 | 4131831 | 2.06 | default
deptime@null | dict | 129177 | 1032957 | 7 | default
deptime@val | p4d | 1652799 | 2065915 | 0.25 | default
deptimeblk | dict | 688870 | 9296621 | 12.49 | default
month | dict | 247852 | 1035246 | 3.38 | default
month | rle | 5 | 607346 | 121468.2 | default
origin | dict | 1119457 | 3098873 | 1.78 | default
quarter | rle | 8 | 1032957 | 136498.61 | default
securitydelay@null | dict | 129516 | 1035666 | 7 | default
securitydelay@null | flat | 1041250 | 1041250 | 0 | default
securitydelay@null | rle | 4869 | 1026493 | 202740.2 | default
securitydelay@val | dict | 581893 | 4131831 | 15.39 | default
tailnum@null | dict | 129516 | 1035666 | 7 | default
tailnum@null | rle | 38643 | 1031646 | 121128.68 | default
tailnum@val | dict | 1659918 | 12395495 | 22.46 | default
taxiin@null | dict | 130011 | 1039625 | 7 | default
taxiin@null | rle | 93404 | 1019833 | 116575.61 | default
taxiin@val | dict | 839917 | 4131831 | 8.49 | default
taxiout@null | dict | 130011 | 1039625 | 7 | default
taxiout@null | rle | 84327 | 1019833 | 116575.86 | default
taxiout@val | dict | 891539 | 4131831 | 8.28 | default
totaladdgtime@null | dict | 129516 | 1035666 | 7 | default
totaladdgtime@null | rle | 3308 | 1031646 | 191894.18 | default
totaladdgtime@val | dict | 465839 | 4131831 | 20.51 | default
uniquecarrier | dict | 578221 | 7230705 | 11.96 | default
year | rle | 6 | 2065915 | 317216.08 | default
Notes on Reading the “Ontime” Table
The following are some useful notes on reading the “Ontime” table shown above:
Higher numbers in the Compression effectiveness column represent better compressions. 0 represents a column that has not been compressed.
Column names are an internal representation. Names with
@null
and@val
suffixes represent a nullable column’s null (boolean) and values respectively, but are treated as one logical column.The query lists all actual compressions for a column, so it may appear several times if the compression has changed mid-way through the loading (as with the
carrierdelay
column).When your compression strategy is
default
, the system automatically selects the best compression, including no compression at all (flat
).
Best Practices
This section describes the best compression practices:
Letting SQream Determine the Best Compression Strategy
In general, SQream determines the best compression strategy for most cases. If you decide to override SQream’s selected compression strategies, we recommend benchmarking your query and load performance in addition to your storage size.
Maximizing the Advantage of Each Compression Scheme
Some compression schemes perform better when data is organized in a specific way. For example, to take advantage of RLE, sorting a column may result in better performance and reduced disk-space and I/O usage. Sorting a column partially may also be beneficial. As a rule of thumb, aim for run-lengths of more than 10 consecutive values.
Choosing Data Types that Fit Your Data
Adapting to the narrowest data type improves query performance while reducing disk space usage. However, smaller data types may compress better than larger types.
For example, SQream recommends using the smallest numeric data type that will accommodate your data. Using BIGINT
for data that fits in INT
or SMALLINT
can use more disk space and memory for query execution. Using FLOAT
to store integers will reduce compression’s effectiveness significantly.
Python User-Defined Functions
User-Defined Functions (UDFs) offer streamlined statements, enabling the creation of a function once, storing it in the database, and calling it multiple times within a statement. Additionally, UDFs can be shared among roles, created by a database administrator and utilized by others. Furthermore, they contribute to code simplicity by allowing independent modifications in SQream DB without altering program source code.
To enable UDFs, in your legacy configuration file, set the enablePythonUdfs
configuration flag to true
.
Before You Begin
Ensure you have Python 3.6.7 or newer installed
Enable UDFs by setting the
enablePythonUdfs
configuration flag totrue
in your legacy configuration file
SQreamDB’s UDF Support
Scalar Functions
SQreamDB’s UDFs are scalar functions. This means that the UDF returns a single data value of the type defined in the RETURNS
clause. For an inline scalar function, the returned scalar value is the result of a single statement.
Python
Python is installed alongside SQreamDB, for use exclusively by SQreamDB. You may have a different version of Python installed on your server.
To find which version of Python is installed for use by SQreamDB, create and run this UDF:
master=> CREATE OR REPLACE FUNCTION py_version()
. RETURNS text
. AS $$
. import sys
. return ("Python version: " + sys.version + ". Path: " + sys.base_exec_prefix)
. $$ LANGUAGE PYTHON;
executed
master=> SELECT py_version();
py_version
-------------------------------------------------------------------------------------
Python version: 3.6.7 (default, Jul 22 2019, 11:03:54) [GCC 5.4.0].
Path: /opt/sqream/python-3.6.7-5.4.0
Using Modules
To import a Python module, use the standard import
syntax in the first lines of the user-defined function.
Working with Existing UDFs
Finding Existing UDFs in the Catalog
The user_defined_functions
catalog view contains function information.
Here’s how you’d list all UDFs in the system:
master=> SELECT * FROM sqream_catalog.user_defined_functions;
database_name | function_id | function_name
--------------+-------------+--------------
master | 1 | my_upper
Getting Function DDL
master=> SELECT GET_FUNCTION_DDL('my_upper');
ddl
----------------------------------------------------
create function "my_upper" (x1 text) returns text as
$$
return x1.upper()
$$
language python volatile;
See GET FUNCTION DDL for more information.
Handling Errors
In UDFs, any error that occurs causes the execution of the function to stop. This in turn causes the statement that invoked the function to be canceled.
Permissions and Sharing
To create a UDF, the creator needs the CREATE FUNCTION
permission at the database level.
For example, to grant CREATE FUNCTION
to a non-superuser role:
GRANT CREATE FUNCTION ON DATABASE master TO role1;
To execute a UDF, the role needs the EXECUTE FUNCTION
permission for every function.
For example, to grant the permission to the r_bi_users
role group, run:
GRANT EXECUTE ON FUNCTION my_upper TO r_bi_users;
Note
Functions are stored for each database, outside of any schema.
See more information about permissions in the Access control guide.
Example
Most databases have an UPPER function, including SQream DB. However, assume that this function is missing for the sake of this example.
You can write a function in Python to uppercase a text value using the CREATE FUNCTION syntax.
CREATE FUNCTION my_upper (x1 text)
RETURNS text
AS $$
return x1.upper()
$$ LANGUAGE PYTHON;
Let’s break down this example:
CREATE FUNCTION my_upper
- Create a function calledmy_upper
. This name must be unique in the current database(x1 text)
- the function accepts one argument namedx1
which is of the SQL typeTEXT
. All data types are supported.RETURNS text
- the function returns the same type -TEXT
. All data types are supported.AS $$
- what follows is some code that we don’t want to quote, so we use dollar-quoting ($$
) instead of single quotes ('
).return x1.upper()
- the Python function’s body is the argument namedx1
, uppercased.$$ LANGUAGE PYTHON
- this is the end of the function, and it’s in the Python language.
Running this example
After creating the function, you can use it in any SQL query.
For example:
master=>CREATE TABLE jabberwocky(line text);
executed
master=> INSERT INTO jabberwocky VALUES
. ('''Twas brillig, and the slithy toves '), (' Did gyre and gimble in the wabe: ')
. ,('All mimsy were the borogoves, '), (' And the mome raths outgrabe. ')
. ,('"Beware the Jabberwock, my son! '), (' The jaws that bite, the claws that catch! ')
. ,('Beware the Jubjub bird, and shun '), (' The frumious Bandersnatch!" ');
executed
master=> SELECT line, my_upper(line) FROM jabberwocky;
line | my_upper
-------------------------------------------------+-------------------------------------------------
'Twas brillig, and the slithy toves | 'TWAS BRILLIG, AND THE SLITHY TOVES
Did gyre and gimble in the wabe: | DID GYRE AND GIMBLE IN THE WABE:
All mimsy were the borogoves, | ALL MIMSY WERE THE BOROGOVES,
And the mome raths outgrabe. | AND THE MOME RATHS OUTGRABE.
"Beware the Jabberwock, my son! | "BEWARE THE JABBERWOCK, MY SON!
The jaws that bite, the claws that catch! | THE JAWS THAT BITE, THE CLAWS THAT CATCH!
Beware the Jubjub bird, and shun | BEWARE THE JUBJUB BIRD, AND SHUN
The frumious Bandersnatch!" | THE FRUMIOUS BANDERSNATCH!"
Best Practices
Although user-defined functions add flexibility, they may have some performance drawbacks. They are not usually a replacement for subqueries or views.
In some cases, the user-defined function provides benefits like sharing extended functionality which makes it very appealing.
Use user-defined functions sparingly in the WHERE
clause. SQream DB can’t optimize the function’s usage, and it will be called once for every value. If possible, you should narrow down the number of results before the UDF is called by using a subquery.
Workload Manager
The Workload Manager allows SQream workers to identify their availability to clients with specific service names. The load balancer uses that information to route statements to specific workers.
Overview
The Workload Manager allows a system engineer or database administrator to allocate specific workers and compute resources for various tasks.
For example:
Creating a service queue named
ETL
and allocating two workers exclusively to this service prevents non-ETL
statements from utilizing these compute resources.Creating a service for the company’s leadership during working hours for dedicated access, and disabling this service at night to allow maintenance operations to use the available compute.
Setting Up Service Queues
By default, every worker subscribes to the sqream
service queue.
Additional service names are configured in the configuration file for every worker, but can also be set on a per-session basis.
Example - Allocating ETL Resources
Allocating ETL resources ensures high quality service without requiring management users to wait.
The configuration in this example allocates resources as shown below:
1 worker for ETL work
3 workers for general queries
All workers assigned to queries from management
Service / Worker |
Worker #1 |
Worker #2 |
Worker #3 |
Worker #4 |
---|---|---|---|---|
ETL |
✓ |
✗ |
✗ |
✗ |
Query service |
✗ |
✓ |
✓ |
✓ |
Management |
✓ |
✓ |
✓ |
✓ |
This configuration gives the ETL queue dedicated access to one worker, which cannot be used..
Queries from management uses any available worker.
Creating the Configuration
{
"cluster": "/home/rhendricks/raviga_database",
"cudaMemQuota": 25,
"gpu": 0,
"maxConnectionInactivitySeconds": 120,
"legacyConfigFilePath": "tzah_legacy.json",
"licensePath": "/home/sqream/.sqream/license.enc",
"metadataServerIp": "192.168.0.103",
"limitQueryMemoryGB": 250,
"machineIP": "192.168.0.103",
"metadataServerPort": 3105,
"port": 5000,
"useConfigIP": true
}
{
"debugNetworkSession": false,
"diskSpaceMinFreePercent": 1,
"maxNumAutoCompressedChunksThreshold" : 1,
"insertMergeRowsThreshold":40000000,
"insertCompressors": 8,
"insertParsers": 8,
"nodeInfoLoggingSec": 60,
"reextentUse": true,
"separatedGatherThreads": 16,
"showFullExceptionInfo": true,
"spoolMemoryGB":200,
"useClientLog": true,
"useMetadataServer":true
}
Tip
You can create this configuration temporarily (for the current session only) by using the SUBSCRIBE_SERVICE and UNSUBSCRIBE_SERVICE statements.
Verifying the Configuration
Use SHOW_SUBSCRIBED_INSTANCES to view service subscriptions for each worker. Use SHOW_SERVER_STATUS to see the statement queues.
t=> SELECT SHOW_SUBSCRIBED_INSTANCES();
service | servernode | serverip | serverport
-----------+------------+---------------+-----------
management | node_9383 | 192.168.0.111 | 5000
etl | node_9383 | 192.168.0.111 | 5000
query | node_9384 | 192.168.0.111 | 5001
management | node_9384 | 192.168.0.111 | 5001
query | node_9385 | 192.168.0.111 | 5002
management | node_9385 | 192.168.0.111 | 5002
query | node_9551 | 192.168.1.91 | 5000
management | node_9551 | 192.168.1.91 | 5000
Configuring a Client Connection to a Specific Service
You can configure a client connection to a specific service in one of the following ways:
Using SQream Studio
When using SQream Studio, you can configure a client connection to a specific service from the SQream Studio, as shown below:

For more information, in Studio, see Executing Statements from the Toolbar.
Using the SQream SQL CLI Reference
When using the SQream SQL CLI Reference, you can configure a client connection to a specific service by adding --service=<service name>
to the command line, as shown below:
$ sqream sql --port=3108 --clustered --username=mjordan --databasename=master --service=etl
Password:
Interactive client mode
To quit, use ^D or \q.
master=>_
For more information, see the Sqream SQL CLI Reference.
Using a JDBC Client Driver
When using a JDBC client driver, you can configure a client connection to a specific service by adding --service=<service name>
to the command line, as shown below:
jdbc:Sqream://127.0.0.1:3108/raviga;user=rhendricks;password=Tr0ub4dor&3;service=etl;cluster=true;ssl=false;
For more information, see the JDBC Client Driver.
Using an ODBC Client Driver
When using an ODBC client driver, you can configure a client connection to a specific service on Linux by modifying the DSN parameters in odbc.ini
.
For example, Service="etl"
:
[sqreamdb]
Description=64-bit Sqream ODBC
Driver=/home/rhendricks/sqream_odbc64/sqream_odbc64.so
Server="127.0.0.1"
Port="3108"
Database="raviga"
Service="etl"
User="rhendricks"
Password="Tr0ub4dor&3"
Cluster=true
Ssl=false
On Windows, change the parameter in the DSN editing window.
For more information, see the ODBC Client Driver.
Using a Python Client Driver
When using a Python client driver, you can configure a client connection to a specific service by setting the service
parameter in the connection command, as shown below:
con = pysqream.connect(host='127.0.0.1', port=3108, database='raviga'
, username='rhendricks', password='Tr0ub4dor&3'
, clustered=True, use_ssl = False, service='etl')
For more information, see the Python (pysqream) connector.
Using a Node.js Client Driver
When using a Node.js client driver, you can configure a client connection to a specific service by adding the service to the connection settings, as shown below:
const Connection = require('sqreamdb');
const config = {
host: '127.0.0.1',
port: 3108,
username: 'rhendricks',
password: 'Tr0ub4dor&3',
connectDatabase: 'raviga',
cluster: 'true',
service: 'etl'
};
For more information, see the Node.js Client Driver.
Concurrency and Locks
Locks are used in SQream DB to provide consistency when there are multiple concurrent transactions updating the database.
Read only transactions are never blocked, and never block anything. Even if you drop a database while concurrently running a query on it, both will succeed correctly (as long as the query starts running before the drop database commits).
Locking Modes
SQream DB has two kinds of locks:
exclusive
- this lock mode prevents the resource from being modified by other statementsThis lock tells other statements that they’ll have to wait in order to change an object.
DDL operations are always exclusive. They block other DDL operations, and update DML operations (insert and delete).
inclusive
- For insert operations, an inclusive lock is obtained on a specific object. This prevents other statements from obtaining an exclusive lock on the object.This lock allows other statements to insert or delete data from a table, but they’ll have to wait in order to run DDL.
When are Locks Obtained?
Operation |
DDL |
|||
---|---|---|---|---|
Concurrent |
Concurrent |
Concurrent |
Concurrent |
|
Concurrent |
Concurrent |
Concurrent |
Wait |
|
Concurrent |
Concurrent |
Wait |
Wait |
|
DDL |
Concurrent |
Wait |
Wait |
Wait |
Statements that wait will exit with an error if they hit the lock timeout. The default timeout is 3 seconds, see statementLockTimeout
.
Monitoring Locks
Monitoring locks across the cluster can be useful when transaction contention takes place, and statements appear “stuck” while waiting for a previous statement to release locks.
The utility SHOW_LOCKS can be used to see the active locks.
In this example, we create a table based on results (CREATE TABLE AS), but we are also effectively dropping the previous table (by using OR REPLACE
which also drops the table). Thus, SQream DB applies locks during the table creation process to prevent the table from being altered during it’s creation.
t=> SELECT SHOW_LOCKS();
statement_id | statement_string | username | server | port | locked_object | lockmode | statement_start_time | lock_start_time
-------------+-------------------------------------------------------------------------------------------------+----------+--------------+------+---------------------------------+-----------+----------------------+--------------------
287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT("Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | database$t | Inclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30
287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT("Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | globalpermission$ | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30
287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT("Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | schema$t$public | Inclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30
287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT("Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | table$t$public$nba2$Insert | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30
287 | CREATE OR REPLACE TABLE nba2 AS SELECT "Name" FROM nba WHERE REGEXP_COUNT("Name", '( )+', 8)>1; | sqream | 192.168.1.91 | 5000 | table$t$public$nba2$Update | Exclusive | 2019-12-26 00:03:30 | 2019-12-26 00:03:30
For more information on troubleshooting lock related issues, see Lock Related Issues.
Concurrency and Scaling in SQream DB
A SQream DB cluster can concurrently run one regular statement per worker process. A number of small statements will execute alongside these statements without waiting or blocking anything.
SQream DB supports n
concurrent statements by having n
workers in a cluster. Each worker uses a fixed slice of a GPU’s memory, with usual values are around 8-16GB of GPU memory per worker. This size is ideal for queries running on large data with potentially large row sizes.
Scaling when data sizes grow
For many statements, SQream DB scales linearly when adding more storage and querying on large data sets. It uses very optimised ‘brute force’ algorithms and implementations, which don’t suffer from sudden performance cliffs at larger data sizes.
Scaling when queries are queueing
SQream DB scales well by adding more workers, GPUs, and nodes to support more concurrent statements.
What to do when queries are slow
Adding more workers or GPUs does not boost the performance of a single statement or query.
To boost the performance of a single statement, start by examining the best practices and ensure the guidelines are followed.
Adding additional RAM to nodes, using more GPU memory, and faster CPUs or storage can also sometimes help.
Need help?
Analyzing complex workloads can be challenging. SQream’s experienced customer support has the experience to advise on these matters to ensure the best experience.
Visit SQream’s support portal for additional support.
Operational Guides
The Operational Guides section describes processes that SQream users can manage to affect the way their system operates, such as creating storage clusters and monitoring query performance.
This section summarizes the following operational guides:
Access Control
Overview
Access control refers to SQream’s authentication and authorization operations, managed using a Role-Based Access Control (RBAC) system, such as ANSI SQL or other SQL products. SQream’s default permissions system is similar to Postgres, but is more powerful. SQream’s method lets administrators prepare the system to automatically provide objects with their required permissions.
SQream users can log in from any worker, which verify their roles and permissions from the metadata server. Each statement issues commands as the role that you’re currently logged into. Roles are defined at the cluster level, and are valid for all databases in the cluster. To bootstrap SQream, new installations require one SUPERUSER
role, typically named sqream
. You can only create new roles by connecting as this role.
Access control refers to the following basic concepts:
Role - A role can be a user, a group, or both. Roles can own database objects (such as tables) and can assign permissions on those objects to other roles. Roles can be members of other roles, meaning a user role can inherit permissions from its parent role.
Authentication - Verifies the identity of the role. User roles have usernames (or role names) and passwords.
Authorization - Checks that a role has permissions to perform a particular operation, such as the GRANT command.
Password Policy
The Password Policy describes the following:
Password Strength Requirements
As part of our compliance with GDPR standards SQream relies on a strong password policy when accessing the CLI or Studio, with the following requirements:
At least eight characters long.
Mandatory upper and lowercase letters.
At least one numeric character.
May not include a username.
Must include at least one special character, such as ?, !, $, etc.
You can create a password by using the Studio graphic interface or using the CLI, as in the following example command:
CREATE ROLE user_a ;
GRANT LOGIN to user_a ;
GRANT PASSWORD 'BBAu47?fqPL' to user_a ;
Creating a password which does not comply with the password policy generates an error message with a request to include any of the missing above requirements:
The password you attempted to create does not comply with SQream's security requirements.
Your password must:
* Be at least eight characters long.
* Contain upper and lowercase letters.
* Contain at least one numeric character.
* Not include a username.
* Include at least one special character, such as **?**, **!**, **$**, etc.
Brute Force Prevention
Unsuccessfully attempting to log in five times displays the following message:
The user is locked. Please contact your system administrator to reset the password and regain access functionality.
You must have superuser permissions to release a locked user to grant a new password:
GRANT PASSWORD '<password>' to <blocked_user>;
For more information, see Adjusting Permitted Log-in Attempts.
Warning
Because superusers can also be blocked, you must have at least two superusers per cluster.
Managing Roles
Roles are used for both users and groups, and are global across all databases in the SQream cluster. For a ROLE
to be used as a user, it requires a password and log-in and connect permissionss to the relevant databases.
The Managing Roles section describes the following role-related operations:
Creating New Roles (Users)
A user role logging in to the database requires LOGIN
permissions and a password.
The following is the syntax for creating a new role:
CREATE ROLE <role_name> ;
GRANT LOGIN to <role_name> ;
GRANT PASSWORD <'new_password'> to <role_name> ;
GRANT CONNECT ON DATABASE <database_name> to <role_name> ;
The following is an example of creating a new role:
CREATE ROLE new_role_name ;
GRANT LOGIN TO new_role_name;
GRANT PASSWORD 'Passw0rd!' to new_role_name;
GRANT CONNECT ON DATABASE master to new_role_name;
A database role may have a number of permissions that define what tasks it can perform, which are assigned using the GRANT command.
Dropping a User
The following is the syntax for dropping a user:
DROP ROLE <role_name> ;
The following is an example of dropping a user:
DROP ROLE admin_role ;
Altering a User Name
The following is the syntax for altering a user name:
ALTER ROLE <role_name> RENAME TO <new_role_name> ;
The following is an example of altering a user name:
ALTER ROLE admin_role RENAME TO copy_role ;
Changing a User Password
You can change a user role’s password by granting the user a new password.
The following is an example of changing a user password:
GRANT PASSWORD <'new_password'> TO rhendricks;
Note
Granting a new password overrides any previous password. Changing the password while the role has an active running statement does not affect that statement, but will affect subsequent statements.
Altering Public Role Permissions
The database has a predefined PUBLIC
role that cannot be deleted. Each user role is automatically granted membership in the PUBLIC
role public group, and this membership cannot be revoked. However, you have the capability to adjust the permissions associated with this PUBLIC
role.
The PUBLIC
role has USAGE
and CREATE
permissions on PUBLIC
schema by default, therefore, newly created user roles are granted CREATE
(databases, schemas, roles, functions, views, and tables) on the public schema. Other permissions, such as INSERT, DELETE, SELECT, and UPDATE on objects in the public schema are not automatically granted.
Altering Role Membership (Groups)
Many database administrators find it useful to group user roles together. By grouping users, permissions can be granted to, or revoked from a group with one command. In SQream DB, this is done by creating a group role, granting permissions to it, and then assigning users to that group role.
To use a role purely as a group, omit granting it LOGIN
and PASSWORD
permissions.
The CONNECT
permission can be given directly to user roles, and/or to the groups they are part of.
CREATE ROLE my_group;
Once the group role exists, you can add user roles (members) using the GRANT
command. For example:
-- Add my_user to this group
GRANT my_group TO my_user;
To manage object permissions like databases and tables, you would then grant permissions to the group-level role (see the permissions table below.
All member roles then inherit the permissions from the group. For example:
-- Grant all group users connect permissions
GRANT CONNECT ON DATABASE a_database TO my_group;
-- Grant all permissions on tables in public schema
GRANT ALL ON all tables IN schema public TO my_group;
Removing users and permissions can be done with the REVOKE
command:
-- remove my_other_user from this group
REVOKE my_group FROM my_other_user;
Permissions
The following table displays the access control permissions:
Permission |
Description |
---|---|
Object/Layer: All Databases |
|
|
Use role to log into the system (the role also needs connect permission on the database it is connecting to) |
|
The password used for logging into the system |
|
No permission restrictions on any activity |
Object/Layer: Database |
|
|
No permission restrictions on any activity within that database (this does not include modifying roles or permissions) |
|
Connect to the database |
|
Create schemas in the database |
|
Create and drop functions |
Object/Layer: Schema |
|
|
Grants access to schema objects |
|
Create tables in the schema |
Object/Layer: Table |
|
|
SELECT from the table |
|
INSERT into the table |
|
UPDATE the value of certain columns in existing rows |
|
|
|
Drop and alter on the table |
|
All the table permissions |
Object/Layer: Function |
|
|
Use the function |
|
Drop and alter on the function |
|
All function permissions |
GRANT
GRANT gives permissions to a role.
-- Grant permissions at the instance/ storage cluster level:
GRANT
{ SUPERUSER
| LOGIN
| PASSWORD '<password>'
}
TO <role> [, ...]
-- Grant permissions at the database level:
GRANT {{CREATE | CONNECT| DDL | SUPERUSER | CREATE FUNCTION} [, ...] | ALL [PERMISSIONS]}
ON DATABASE <database> [, ...]
TO <role> [, ...]
-- Grant permissions at the schema level:
GRANT {{ CREATE | DDL | USAGE | SUPERUSER } [, ...] | ALL [
PERMISSIONS ]}
ON SCHEMA <schema> [, ...]
TO <role> [, ...]
-- Grant permissions at the object level:
GRANT {{SELECT | INSERT | DELETE | DDL | UPDATE } [, ...] | ALL [PERMISSIONS]}
ON { TABLE <table_name> [, ...] | ALL TABLES IN SCHEMA <schema_name> [, ...]}
TO <role> [, ...]
-- Grant execute function permission:
GRANT {ALL | EXECUTE | DDL} ON FUNCTION function_name
TO role;
-- Allows role2 to use permissions granted to role1
GRANT <role1> [, ...]
TO <role2>
-- Also allows the role2 to grant role1 to other roles:
GRANT <role1> [, ...]
TO <role2>
GRANT
examples:
GRANT LOGIN,superuser TO admin;
GRANT CREATE FUNCTION ON database master TO admin;
GRANT SELECT ON TABLE admin.table1 TO userA;
GRANT EXECUTE ON FUNCTION my_function TO userA;
GRANT ALL ON FUNCTION my_function TO userA;
GRANT DDL ON admin.main_table TO userB;
GRANT ALL ON all tables IN schema public TO userB;
GRANT admin TO userC;
GRANT superuser ON schema demo TO userA
GRANT admin_role TO userB;
REVOKE
REVOKE removes permissions from a role.
-- Revoke permissions at the instance/ storage cluster level:
REVOKE
{ SUPERUSER
| LOGIN
| PASSWORD
}
FROM <role> [, ...]
-- Revoke permissions at the database level:
REVOKE {{CREATE | CONNECT | DDL | SUPERUSER | CREATE FUNCTION}[, ...] |ALL [PERMISSIONS]}
ON DATABASE <database> [, ...]
FROM <role> [, ...]
-- Revoke permissions at the schema level:
REVOKE { { CREATE | DDL | USAGE | SUPERUSER } [, ...] | ALL [PERMISSIONS]}
ON SCHEMA <schema> [, ...]
FROM <role> [, ...]
-- Revoke permissions at the object level:
REVOKE { { SELECT | INSERT | DELETE | DDL | UPDATE } [, ...] | ALL }
ON { [ TABLE ] <table_name> [, ...] | ALL TABLES IN SCHEMA
<schema_name> [, ...] }
FROM <role> [, ...]
-- Removes access to permissions in role1 by role 2
REVOKE <role1> [, ...] FROM <role2> [, ...]
-- Removes permissions to grant role1 to additional roles from role2
REVOKE <role1> [, ...] FROM <role2> [, ...]
Examples:
REVOKE superuser on schema demo from userA;
REVOKE delete on admin.table1 from userB;
REVOKE login from role_test;
REVOKE CREATE FUNCTION FROM admin;
Default permissions
The default permissions system (See ALTER DEFAULT PERMISSIONS) can be used to automatically grant permissions to newly created objects (See the departmental example below for one way it can be used).
A default permissions rule looks for a schema being created, or a table (possibly by schema), and is table to grant any permission to that object to any role. This happens when the create table or create schema statement is run.
ALTER DEFAULT PERMISSIONS FOR modifying_role
[IN schema_name, ...]
FOR { TABLES | SCHEMAS }
{ grant_clause | DROP grant_clause}
TO ROLE { role_name | public };
grant_clause ::=
GRANT
{ CREATE FUNCTION
| SUPERUSER
| CONNECT
| CREATE
| USAGE
| SELECT
| INSERT
| DELETE
| DDL
| UPDATE
| EXECUTE
| ALL
}
Departmental Example
You work in a company with several departments.
The example below shows you how to manage permissions in a database shared by multiple departments, where each department has different roles for the tables by schema. It walks you through how to set the permissions up for existing objects and how to set up default permissions rules to cover newly created objects.
The concept is that you set up roles for each new schema with the correct permissions, then the existing users can use these roles.
A superuser must do new setup for each new schema which is a limitation, but superuser permissions are not needed at any other time, and neither are explicit grant statements or object ownership changes.
In the example, the database is called my_database
, and the new or existing schema being set up to be managed in this way is called my_schema
.
Our departmental example has four user group roles and seven users roles
There will be a group for this schema for each of the following:
Group |
Activities |
---|---|
database designers |
create, alter and drop tables |
updaters |
insert and delete data |
readers |
read data |
security officers |
add and remove users from these groups |
Setting up the department permissions
As a superuser, you connect to the system and run the following:
-- create the groups
CREATE ROLE my_schema_security_officers;
CREATE ROLE my_schema_database_designers;
CREATE ROLE my_schema_updaters;
CREATE ROLE my_schema_readers;
-- grant permissions for each role
-- we grant permissions for existing objects here too,
-- so you don't have to start with an empty schema
-- security officers
GRANT connect ON DATABASE my_database TO my_schema_security_officers;
GRANT usage ON SCHEMA my_schema TO my_schema_security_officers;
GRANT my_schema_database_designers TO my_schema_security_officers WITH ADMIN OPTION;
GRANT my_schema_updaters TO my_schema_security_officers WITH ADMIN OPTION;
GRANT my_schema_readers TO my_schema_security_officers WITH ADMIN OPTION;
-- database designers
GRANT connect ON DATABASE my_database TO my_schema_database_designers;
GRANT usage ON SCHEMA my_schema TO my_schema_database_designers;
GRANT create,ddl ON SCHEMA my_schema TO my_schema_database_designers;
-- updaters
GRANT connect ON DATABASE my_database TO my_schema_updaters;
GRANT usage ON SCHEMA my_schema TO my_schema_updaters;
GRANT SELECT,INSERT,DELETE ON ALL TABLES IN SCHEMA my_schema TO my_schema_updaters;
-- readers
GRANT connect ON DATABASE my_database TO my_schema_readers;
GRANT usage ON SCHEMA my_schema TO my_schema_readers;
GRANT SELECT ON ALL TABLES IN SCHEMA my_schema TO my_schema_readers;
GRANT EXECUTE ON ALL FUNCTIONS TO my_schema_readers;
-- create the default permissions for new objects
ALTER DEFAULT PERMISSIONS FOR my_schema_database_designers IN my_schema
FOR TABLES GRANT SELECT,INSERT,DELETE TO my_schema_updaters;
-- For every table created by my_schema_database_designers, give access to my_schema_readers:
ALTER DEFAULT PERMISSIONS FOR my_schema_database_designers IN my_schema
FOR TABLES GRANT SELECT TO my_schema_readers;
Note
This process needs to be repeated by a user with
SUPERUSER
permissions each time a new schema is brought into this permissions management approach.By default, any new object created will not be accessible by our new
my_schema_readers
group. Running aGRANT SELECT ...
only affects objects that already exist in the schema or database.If you’re getting a
Missing the following permissions: SELECT on table 'database.public.tablename'
error, make sure that you’ve altered the default permissions with theALTER DEFAULT PERMISSIONS
statement.
Creating new users in the departments
After the group roles have been created, you can now create user roles for each of your users.
-- create the new database designer users
CREATE ROLE ecodd;
GRANT LOGIN TO ecodd;
GRANT PASSWORD 'Passw0rd!' TO ecodd;
GRANT CONNECT ON DATABASE my_database TO ecodd;
GRANT my_schema_database_designers TO ecodd;
CREATE ROLE ebachmann;
GRANT LOGIN TO ebachmann;
GRANT PASSWORD 'Passw0rd!!!' TO ebachmann;
GRANT CONNECT ON DATABASE my_database TO ebachmann;
GRANT my_database_designers TO ebachmann;
-- If a user already exists, we can assign that user directly to the group
GRANT my_schema_updaters TO rhendricks;
-- Create users in the readers group
CREATE ROLE jbarker;
GRANT LOGIN TO jbarker;
GRANT PASSWORD 'action_jacC%k' TO jbarker;
GRANT CONNECT ON DATABASE my_database TO jbarker;
GRANT my_schema_readers TO jbarker;
CREATE ROLE lbream;
GRANT LOGIN TO lbream;
GRANT PASSWORD 'artichoke123O$' TO lbream;
GRANT CONNECT ON DATABASE my_database TO lbream;
GRANT my_schema_readers TO lbream;
CREATE ROLE pgregory;
GRANT LOGIN TO pgregory;
GRANT PASSWORD 'c1ca6aG$' TO pgregory;
GRANT CONNECT ON DATABASE my_database TO pgregory;
GRANT my_schema_readers TO pgregory;
-- Create users in the security officers group
CREATE ROLE hoover;
GRANT LOGIN TO hoover;
GRANT PASSWORD 'mint*Rchip' TO hoover;
GRANT CONNECT ON DATABASE my_database TO hoover;
GRANT my_schema_security_officers TO hoover;
After this setup:
Database designers will be able to run any ddl on objects in the schema and create new objects, including ones created by other database designers
Updaters will be able to insert and delete to existing and new tables
Readers will be able to read from existing and new tables
All this will happen without having to run any more GRANT
statements.
Any security officer will be able to add and remove users from these groups. Creating and dropping login users themselves must be done by a superuser.
Creating or Cloning Storage Clusters
When SQream DB is installed, it comes with a default storage cluster. This guide will help if you need a fresh storage cluster or a separate copy of an existing storage cluster.
Creating a new storage cluster
SQream DB comes with a CLI tool, SqreamStorage. This tool can be used to create a new empty storage cluster.
In this example, we will create a new cluster at /home/rhendricks/raviga_database
:
$ SqreamStorage --create-cluster --cluster-root /home/rhendricks/raviga_database
Setting cluster version to: 26
This can also be written shorthand as SqreamStorage -C -r /home/rhendricks/raviga_database
.
This Setting cluster version...
message confirms the creation of the cluster successfully.
Tell SQream DB to use this storage cluster
Permanently setting the storage cluster setting
To permanently set the new cluster location, change the "cluster"
path listed in the configuration file.
For example:
{
"compileFlags": {
},
"runtimeFlags": {
},
"runtimeGlobalFlags": {
},
"server": {
"gpu": 0,
"port": 5000,
"cluster": "/home/sqream/my_old_cluster",
"licensePath": "/home/sqream/.sqream/license.enc"
}
}
should be changed to
{
"compileFlags": {
},
"runtimeFlags": {
},
"runtimeGlobalFlags": {
},
"server": {
"gpu": 0,
"port": 5000,
"cluster": "/home/rhendricks/raviga_database",
"licensePath": "/home/sqream/.sqream/license.enc"
}
}
Now, the cluster should be restarted for the changes to take effect.
Start a temporary SQream DB worker with a storage cluster
Starting a SQream DB worker with a custom cluster path can be done in two ways:
Using a configuration file (recommended)
Similar to the technique above, create a configuration file with the correct cluster path. Then, start sqreamd
using the -config
flag:
$ sqreamd -config config_file.json
Using the command line parameters
Use sqreamd’s command line parameters to override the default storage cluster path:
$ sqreamd /home/rhendricks/raviga_database 0 5000 /home/sqream/.sqream/license.enc
Note
sqreamd’s command line parameters’ order is sqreamd <cluster path> <GPU ordinal> <TCP listen port (unsecured)> <License path>
Copying an existing storage cluster
Copying an existing storage cluster to another path may be useful for testing or troubleshooting purposes.
Identify the location of the active storage cluster. This path can be found in the configuration file, under the
"cluster"
parameter.Shut down the SQream DB cluster. This prevents very large storage directories from being modified during the copy process.
(optional) Create a tarball of the storage cluster, with
tar -zcvf sqream_cluster_`date +"%Y-%m-%d-%H-%M"`.tgz <cluster path>
. This will create a tarball with the current date and time as part of the filename.Copy the storage cluster directory (or tarball) with
cp
to another location on the local filesystem, or usersync
to copy to a remote server.After the copy is completed, start the SQream DB cluster to continue using SQream DB.
Working with External Data
SQream supports the following external data sources:
For more information, see the following:
Foreign Tables
Foreign tables can be used to run queries directly on data without inserting it into SQream DB first. SQream DB supports read-only foreign tables so that you can query from foreign tables, but you cannot insert to them, or run deletes or updates on them.
Running queries directly on foreign data is most effectively used for one-off querying. If you are repeatedly querying data, the performance will usually be better if you insert the data into SQream DB first.
Although foreign tables can be used without inserting data into SQream DB, one of their main use cases is to help with the insertion process. An insert select statement on a foreign table can be used to insert data into SQream using the full power of the query engine to perform ETL.
Supported Data Formats
SQream DB supports foreign tables over:
Text - CSV, TSV, and PSV
Parquet
ORC
Avro
JSON
Supported Data Staging
SQream can stage data from:
a local filesystem (e.g.
/mnt/storage/....
)Amazon Web Services buckets (e.g.
s3://pp-secret-bucket/users/*.parquet
)HDFS Environment (e.g.
hdfs://hadoop-nn.piedpiper.com/rhendricks/*.csv
)
Using Foreign Tables
Use a foreign table to stage data before loading from CSV, Parquet or ORC files.
Planning for Data Staging
For the following examples, we will interact with a CSV file.
The file is stored on Amazon Web Services, at s3://sqream-demo-data/nba_players.csv
.
We will make note of the file structure, to create a matching CREATE_EXTERNAL_TABLE
statement.
Creating a Foreign Table
Based on the source file structure, we create a foreign table with the appropriate structure, and point it to the file.
CREATE foreign table nba
(
Name varchar,
Team varchar,
Number tinyint,
Position varchar,
Age tinyint,
Height varchar,
Weight real,
College varchar,
Salary float
)
WRAPPER csv_fdw
OPTIONS
( LOCATION = 's3://sqream-demo-data/nba_players.csv',
DELIMITER = '\r\n' -- DOS delimited file
);
The file format in this case is CSV, and it is stored as an Amazon Web Services object (if the path is on HDFS Environment, change the URI accordingly).
We also took note that the record delimiter was a DOS newline (\r\n
).
Querying Foreign Tables
Let’s peek at the data from the foreign table:
t=> SELECT * FROM nba LIMIT 10;
name | team | number | position | age | height | weight | college | salary
--------------+----------------+--------+----------+-----+--------+--------+-------------------+---------
Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas | 7730337
Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette | 6796117
John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston University |
R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia State | 1148640
Jonas Jerebko | Boston Celtics | 8 | PF | 29 | 6-10 | 231 | | 5000000
Amir Johnson | Boston Celtics | 90 | PF | 29 | 6-9 | 240 | | 12000000
Jordan Mickey | Boston Celtics | 55 | PF | 21 | 6-8 | 235 | LSU | 1170960
Kelly Olynyk | Boston Celtics | 41 | C | 25 | 7-0 | 238 | Gonzaga | 2165160
Terry Rozier | Boston Celtics | 12 | PG | 22 | 6-2 | 190 | Louisville | 1824360
Marcus Smart | Boston Celtics | 36 | PG | 22 | 6-4 | 220 | Oklahoma State | 3431040
Modifying Data from Staging
One of the main reasons for staging data is to examine the content and modify it before loading. Assume we are unhappy with weight being in pounds because we want to use kilograms instead. We can apply the transformation as part of a query:
t=> SELECT name, team, number, position, age, height, (weight / 2.205) as weight, college, salary
. FROM nba
. ORDER BY weight;
name | team | number | position | age | height | weight | college | salary
-------------------------+------------------------+--------+----------+-----+--------+----------+-----------------------+---------
Nikola Pekovic | Minnesota Timberwolves | 14 | C | 30 | 6-11 | 139.229 | | 12100000
Boban Marjanovic | San Antonio Spurs | 40 | C | 27 | 7-3 | 131.5193 | | 1200000
Al Jefferson | Charlotte Hornets | 25 | C | 31 | 6-10 | 131.0658 | | 13500000
Jusuf Nurkic | Denver Nuggets | 23 | C | 21 | 7-0 | 126.9841 | | 1842000
Andre Drummond | Detroit Pistons | 0 | C | 22 | 6-11 | 126.5306 | Connecticut | 3272091
Kevin Seraphin | New York Knicks | 1 | C | 26 | 6-10 | 126.0771 | | 2814000
Brook Lopez | Brooklyn Nets | 11 | C | 28 | 7-0 | 124.7166 | Stanford | 19689000
Jahlil Okafor | Philadelphia 76ers | 8 | C | 20 | 6-11 | 124.7166 | Duke | 4582680
Cristiano Felicio | Chicago Bulls | 6 | PF | 23 | 6-10 | 124.7166 | | 525093
[...]
Now, if we’re happy with the results, we can convert the staged foreign table to a standard table
Converting a Foreign Table to a Standard Database Table
CREATE TABLE AS can be used to materialize a foreign table into a regular table.
Tip
If you intend to use the table multiple times, convert the foreign table to a standard table.
t=> CREATE TABLE real_nba AS
. SELECT name, team, number, position, age, height, (weight / 2.205) as weight, college, salary
. FROM nba
. ORDER BY weight;
executed
t=> SELECT * FROM real_nba LIMIT 5;
name | team | number | position | age | height | weight | college | salary
-----------------+------------------------+--------+----------+-----+--------+----------+-------------+---------
Nikola Pekovic | Minnesota Timberwolves | 14 | C | 30 | 6-11 | 139.229 | | 12100000
Boban Marjanovic | San Antonio Spurs | 40 | C | 27 | 7-3 | 131.5193 | | 1200000
Al Jefferson | Charlotte Hornets | 25 | C | 31 | 6-10 | 131.0658 | | 13500000
Jusuf Nurkic | Denver Nuggets | 23 | C | 21 | 7-0 | 126.9841 | | 1842000
Andre Drummond | Detroit Pistons | 0 | C | 22 | 6-11 | 126.5306 | Connecticut | 3272091
Error Handling and Limitations
Error handling in foreign tables is limited. Any error that occurs during source data parsing will result in the statement aborting.
Foreign tables are logical and do not contain any data, their structure is not verified or enforced until a query uses the table. For example, a CSV with the wrong delimiter may cause a query to fail, even though the table has been created successfully:
t=> SELECT * FROM nba; master=> select * from nba; Record delimiter mismatch during CSV parsing. User defined line delimiter \n does not match the first delimiter \r\n found in s3://sqream-demo-data/nba.csv
Since the data for a foreign table is not stored in SQream DB, it can be changed or removed at any time by an external process. As a result, the same query can return different results each time it runs against a foreign table. Similarly, a query might fail if the external data is moved, removed, or has changed structure.
Deleting Data
The Deleting Data page describes how the Delete statement works and how to maintain data that you delete:
Overview
Deleting data typically refers to deleting rows, but can refer to deleting other table content as well. The general workflow for deleting data is to delete data followed by triggering a cleanup operation. The cleanup operation reclaims the space occupied by the deleted rows, discussed further below.
The DELETE statement deletes rows defined by a predicate that you have specified, preventing them from appearing in subsequent queries.
For example, the predicate below defines and deletes rows containing animals heavier than 1000 weight units:
farm=> DELETE FROM cool_animals WHERE weight > 1000;
The major benefit of the DELETE statement is that it deletes transactions simply and quickly.
The Deletion Process
Deleting rows occurs in the following two phases:
Phase 1 - Deletion - All rows you mark for deletion are ignored when you run any query. These rows are not deleted until the clean-up phase.
Phase 2 - Clean-up - The rows you marked for deletion in Phase 1 are physically deleted. The clean-up phase is not automated, letting users or DBAs control when to activate it. The files you marked for deletion during Phase 1 are removed from disk, which you do by by sequentially running the utility function commands
CLEANUP_CHUNKS
andCLEANUP_EXTENTS
.
Usage Notes
The Usage Notes section includes important information about the DELETE statement:
General Notes
This section describes the general notes applicable when deleting rows:
The ALTER TABLE command and other DDL operations are locked on tables that require clean-up. If the estimated clean-up time exceeds the permitted threshold, an error message is displayed describing how to override the threshold limitation. For more information, see Concurrency and Locks.
If the number of deleted records exceeds the threshold defined by the
mixedColumnChunksThreshold
parameter, the delete operation is aborted. This alerts users that the large number of deleted records may result in a large number of mixed chunks. To circumvent this alert, use the following syntax (replacingXXX
with the desired number of records) before running the delete operation:set mixedColumnChunksThreshold=XXX;
Deleting Data does not Free Space
With the exception of running a full table delete, deleting data does not free unused disk space. To free unused disk space you must trigger the clean-up process.
For more information on running a full table delete, see TRUNCATE.
For more information on freeing disk space, see Triggering a Clean-Up.
Clean-Up Operations Are I/O Intensive
The clean-up process reduces table size by removing all unused space from column chunks. While this reduces query time, it is a time-costly operation occupying disk space for the new copy of the table until the operation is complete.
Tip
Because clean-up operations can create significant I/O load on your database, consider using them sparingly during ideal times.
If this is an issue with your environment, consider using CREATE TABLE AS
to create a new table and then rename and drop the old table.
Examples
The Examples section includes the following examples:
Deleting Rows from a Table
The following example shows how to delete rows from a table.
Display the table:
farm=> SELECT * FROM cool_animals;
The following table is displayed:
1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N
Delete rows from the table:
farm=> DELETE FROM cool_animals WHERE weight > 1000;
Display the table:
farm=> SELECT * FROM cool_animals;
The following table is displayed:
1,Dog ,7 2,Possum ,3 3,Cat ,5 6,\N,\N
Deleting Values Based on Complex Predicates
The following example shows how to delete values based on complex predicates.
Display the table:
farm=> SELECT * FROM cool_animals;
The following table is displayed:
1,Dog ,7 2,Possum ,3 3,Cat ,5 4,Elephant ,6500 5,Rhinoceros ,2100 6,\N,\N
Delete rows from the table:
farm=> DELETE FROM cool_animals WHERE weight > 1000;
Display the table:
farm=> SELECT * FROM cool_animals;
The following table is displayed:
1,Dog ,7 2,Possum ,3 3,Cat ,5 6,\N,\N
Identifying and Cleaning Up Tables
The Identifying and Cleaning Up Tables section includes the following examples:
Listing Tables that Have Not Been Cleaned Up
The following example shows how to list tables that have not been cleaned up:
farm=> SELECT t.table_name FROM sqream_catalog.delete_predicates dp
JOIN sqream_catalog.tables t
ON dp.table_id = t.table_id
GROUP BY 1;
cool_animals
1 row
Identifying Predicates for Clean-Up
The following example shows how to identify predicates for clean-up:
farm=> SELECT delete_predicate FROM sqream_catalog.delete_predicates dp
JOIN sqream_catalog.tables t
ON dp.table_id = t.table_id
WHERE t.table_name = 'cool_animals';
weight > 1000
1 row
Triggering a Clean-Up
The following example shows how to trigger a clean-up:
Run the chunk
CLEANUP_CHUNKS
command (also known asSWEEP
) to reorganize the chunks:farm=> SELECT CLEANUP_CHUNKS('public','cool_animals');
Run the
CLEANUP_EXTENTS
command (also known asVACUUM
) to delete the leftover files:farm=> SELECT CLEANUP_EXTENTS('public','cool_animals');
Display the table:
farm=> SELECT delete_predicate FROM sqream_catalog.delete_predicates dp JOIN sqream_catalog.tables t ON dp.table_id = t.table_id WHERE t.table_name = 'cool_animals';
Best Practices
This section includes the best practices when deleting rows:
Run
CLEANUP_CHUNKS
andCLEANUP_EXTENTS
after running largeDELETE
operations.When you delete large segments of data from very large tables, consider running a
CREATE TABLE AS
operation instead, renaming, and dropping the original table.Avoid killing
CLEANUP_EXTENTS
operations in progress.SQream is optimized for time-based data, which is data naturally ordered according to date or timestamp. Deleting rows based on such columns leads to increased performance.
Exporting Data
You can export data from SQream, which you may want to do for the following reasons:
To use data in external tables. See Working with External Data.
To share data with other clients or consumers with different systems.
To copy data into another SQream cluster.
SQream provides the following methods for exporting data:
Copying data from a SQream database table or query to another file - See COPY TO.
Logging
Locating the Log Files
The storage cluster contains a logs
directory. Each worker produces a log file in its own directory, which can be identified by the worker’s hostname and port.
Note
Additional internal debug logs may reside in the main logs
directory.
The worker logs contain information messages, warnings, and errors pertaining to SQream DB’s operation, including:
Server start-up and shutdown
Configuration changes
Exceptions and errors
User login events
Session events
Statement execution success / failure
Statement execution statistics
Log Structure and Contents
The log is a CSV, with several fields.
Field |
Description |
---|---|
|
Start delimiter. When used with the end of line delimiter can be used to parse multi-line statements correctly |
Row Id |
Unique identifier for the row |
Timestamp |
Timestamp for the message (ISO 8601 date format) |
Information Level |
Information level of the message. See information level table below |
Thread Id |
System thread identifier (internal use) |
Worker hostname |
Hostname of the worker that generated the message |
Worker port |
Port of the worker that generated the message |
Connection Id |
Connection Id for the message. Defaults to |
Database name |
Database name that generated the message. Can be empty for no database |
User Id |
User role that was connected during the message. Can be empty if no user caused the message |
Statement Id |
Statement Id for the message. Defaults to |
Service name |
Service name for the connection. Can be empty. |
Message type Id |
Message type Id. See message type table below) |
Message |
Content for the message |
|
End of line delimiter |
Level |
Description |
---|---|
|
System information like start up, shutdown, configuration change |
|
Fatal errors that may cause outage |
|
Errors encountered during statement execution |
|
Warnings |
|
Information and statistics |
Type |
Level |
Description |
Example message content |
---|---|---|---|
|
|
Statement start information |
|
|
|
Statement passed to another worker for execution |
|
|
|
Statement has entered execution |
|
|
|
Statement execution completed |
|
|
|
Compilation error, with accompanying error message |
|
|
|
Execution error, with accompanying error message |
|
|
|
Size of data read from disk in megabytes |
|
|
|
Row count of result set |
|
|
|
Processed Rows |
|
|
|
Session start - Client IP address |
|
|
|
Login |
|
|
|
Session end |
|
|
|
SHOW_NODE_INFO periodic output |
|
|
|
Exception occured in a statement |
|
|
|
Worker startup message |
|
|
|
|
|
|
|
Show all configuration values |
"Flags configuration:
compileFlags, extendedAssertions, false, true;
compileFlags, useSortMergeJoin, false, false;
compileFlags, distinctAggregatesOnHost, true, false;
[...]"
|
|
|
SQream DB metadata version |
|
|
|
Fatal server error |
|
|
|
Configuration change |
|
|
|
Worker shutdown |
|
Log-Naming
Log file name syntax
sqream_<date>_<sequence>.log
date
is formatted%y%m%d
, for example20191231
for December 31st 2019.By default, each worker will create a new log file every time it is restarted.
sequence
is the log’s sequence. When a log is rotated, the sequence number increases. This starts at000
.
For example, /home/rhendricks/sqream_storage/192.168.1.91_5000
.
See the Changing Log Rotation below for information about controlling this setting.
Log Control and Maintenance
Changing Log Verbosity
A few configuration settings alter the verbosity of the logs:
Flag |
Description |
Default |
Values |
---|---|---|---|
|
Used to control which log level should appear in the logs |
|
|
|
Sets an interval for automatically logging long-running statements’ SHOW_NODE_INFO output.
Output is written as a message type |
|
Positive whole number >=1. |
Changing Log Rotation
A few configuration settings alter the log rotation policy:
Flag |
Description |
Default |
Values |
---|---|---|---|
|
Rotate log files once they reach a certain file size. When |
|
|
|
Sets the size threshold in megabytes after which a new log file will be opened. |
|
|
|
Frequency of log rotation |
|
|
Collecting Logs from Your Cluster
Collecting logs from your cluster can be as simple as creating an archive from the logs
subdirectory: tar -czvf logs.tgz *.log
.
However, SQream DB comes bundled with a data collection utility and an SQL utility intended for collecting logs and additional information that can help SQream support drill down into possible issues.
SQL Syntax
SELECT REPORT_COLLECTION(output_path, mode)
;
output_path ::=
filepath
mode ::=
log | db | db_and_log
Command Line Utility
If you cannot access SQream DB for any reason, you can also use a command line tool to collect the same information:
$ ./bin/report_collection <path to storage> <path for output> <mode>
Parameters
Parameter |
Description |
---|---|
|
Path for the output archive. The output file will be named |
|
One of three modes:
* |
Example
Write an archive to /home/rhendricks
, containing log files:
SELECT REPORT_COLLECTION('/home/rhendricks', 'log')
;
Write an archive to /home/rhendricks
, containing log files and metadata database:
SELECT REPORT_COLLECTION('/home/rhendricks', 'db_and_log')
;
Using the command line utility:
$ ./bin/report_collection /home/rhendricks/sqream_storage /home/rhendricks db_and_log
Troubleshooting with Logs
Loading Logs with Foreign Tables
Assuming logs are stored at /home/rhendricks/sqream_storage/logs/
, a database administrator can access the logs using the external_tables concept through SQream DB.
CREATE FOREIGN TABLE logs
(
start_marker TEXT(4),
row_id BIGINT,
timestamp DATETIME,
message_level TEXT,
thread_id TEXT,
worker_hostname TEXT,
worker_port INT,
connection_id INT,
database_name TEXT,
user_name TEXT,
statement_id INT,
service_name TEXT,
message_type_id INT,
message TEXT,
end_message TEXT(5)
)
WRAPPER csv_fdw
OPTIONS
(
LOCATION = '/home/rhendricks/sqream_storage/logs/**/sqream*.log',
DELIMITER = '|',
CONTINUE_ON_ERROR = true
)
;
For more information, see Loading Logs with Foreign Tables.
Counting Message Types
t=> SELECT message_type_id, COUNT(*) FROM logs GROUP BY 1;
message_type_id | count
----------------+----------
0 | 9
1 | 5578
4 | 2319
10 | 2788
20 | 549
30 | 411
31 | 1720
32 | 1720
100 | 2592
101 | 2598
110 | 2571
200 | 11
500 | 136
1000 | 19
1003 | 19
1004 | 19
1010 | 5
Finding Fatal Errors
t=> SELECT message FROM logs WHERE message_type_id=1010;
Internal Runtime Error,open cluster metadata database:IO error: lock /home/rhendricks/sqream_storage/rocksdb/LOCK: Resource temporarily unavailable
Internal Runtime Error,open cluster metadata database:IO error: lock /home/rhendricks/sqream_storage/rocksdb/LOCK: Resource temporarily unavailable
Mismatch in storage version, upgrade is needed,Storage version: 25, Server version is: 26
Mismatch in storage version, upgrade is needed,Storage version: 25, Server version is: 26
Internal Runtime Error,open cluster metadata database:IO error: lock /home/rhendricks/sqream_storage/LOCK: Resource temporarily unavailable
Countng Error Events Within a Certain Timeframe
t=> SELECT message_type_id,
. COUNT(*)
. FROM logs
. WHERE message_type_id IN (1010,500)
. AND timestamp BETWEEN '2019-12-20' AND '2020-01-01'
. GROUP BY 1;
message_type_id | count
----------------+------
500 | 18
1010 | 3
Tracing Errors to Find Offending Statements
If we know an error occured, but don’t know which statement caused it, we can find it using the connection ID and statement ID.
t=> SELECT connection_id, statement_id, message
. FROM logs
. WHERE message_level = 'ERROR'
. AND timestamp BETWEEN '2020-01-01' AND '2020-01-06';
connection_id | statement_id | message
--------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------
79 | 67 | Column type mismatch, expected UByte, got INT64 on column Number, file name: /home/sqream/nba.parquet
Use the connection_id
and statement_id
to narrow down the results.
t=> SELECT database_name, message FROM logs
. WHERE connection_id=79 AND statement_id=67 AND message_type_id=1;
database_name | message
--------------+--------------------------
master | Query before parsing
master | SELECT * FROM nba_parquet
Monitoring Query Performance
When analyzing options for query tuning, the first step is to analyze the query plan and execution. The query plan and execution details explain how SQreamDB processes a query and where time is spent. This document details how to analyze query performance with execution plans. This guide focuses specifically on identifying bottlenecks and possible optimization techniques to improve query performance. Performance tuning options for each query are different. You should adapt the recommendations and tips for your own workloads. See also our Optimization and Best Practices guide for more information about data loading considerations and other best practices.
Setting Up the System for Monitoring
By default, SQreamDB logs execution details for every statement that runs for more than 60 seconds. If you want to see the execution details for a currently running statement, see Using the SHOW_NODE_INFO Command below.
Adjusting the Logging Frequency
To adjust the frequency of logging for statements, you may want to reduce the interval from 60 seconds down to,
say, 5 or 10 seconds. Modify the configuration files and set the nodeInfoLoggingSec
parameter as you see fit:
{
"compileFlags":{
},
"runtimeFlags":{
},
"runtimeGlobalFlags":{
"nodeInfoLoggingSec" : 5,
},
"server":{
}
}
After restarting the SQreamDB cluster, the execution plan details will be logged to the standard SQreamDB logs directory, as a message of type 200
.
You can see these messages with a text viewer or with queries on the log Foreign Tables.
Reading Execution Plans with a Foreign Table
First, create a foreign table for the logs
CREATE FOREIGN TABLE logs (
start_marker TEXT(4),
row_id BIGINT,
timestamp DATETIME,
message_level TEXT,
thread_id TEXT,
worker_hostname TEXT,
worker_port INT,
connection_id INT,
database_name TEXT,
user_name TEXT,
statement_id INT,
service_name TEXT,
message_type_id INT,
message TEXT,
end_message TEXT(5)
)
WRAPPER
csv_fdw
OPTIONS
(
LOCATION = '/home/rhendricks/sqream_storage/logs/**/sqream*.log',
DELIMITER = '|'
);
Once you’ve defined the foreign table, you can run queries to observe the previously logged execution plans. This is recommended over looking at the raw logs.
SELECT
message
FROM
logs
WHERE
message_type_id = 200
AND timestamp BETWEEN '2020-06-11' AND '2020-06-13';
message
---------------------------------------------------------------------------------------------------------------------------------
SELECT *,coalesce((depdelay > 15),false) AS isdepdelayed FROM ontime WHERE year IN (2005, 2006, 2007, 2008, 2009, 2010)
1,PushToNetworkQueue ,10354468,10,1035446,2020-06-12 20:41:42,-1,,,,13.55
2,Rechunk ,10354468,10,1035446,2020-06-12 20:41:42,1,,,,0.10
3,ReorderInput ,10354468,10,1035446,2020-06-12 20:41:42,2,,,,0.00
4,DeferredGather ,10354468,10,1035446,2020-06-12 20:41:42,3,,,,1.23
5,ReorderInput ,10354468,10,1035446,2020-06-12 20:41:41,4,,,,0.01
6,GpuToCpu ,10354468,10,1035446,2020-06-12 20:41:41,5,,,,0.07
7,GpuTransform ,10354468,10,1035446,2020-06-12 20:41:41,6,,,,0.02
8,ReorderInput ,10354468,10,1035446,2020-06-12 20:41:41,7,,,,0.00
9,Filter ,10354468,10,1035446,2020-06-12 20:41:41,8,,,,0.07
10,GpuTransform ,10485760,10,1048576,2020-06-12 20:41:41,9,,,,0.07
11,GpuDecompress ,10485760,10,1048576,2020-06-12 20:41:41,10,,,,0.03
12,GpuTransform ,10485760,10,1048576,2020-06-12 20:41:41,11,,,,0.22
13,CpuToGpu ,10485760,10,1048576,2020-06-12 20:41:41,12,,,,0.76
14,ReorderInput ,10485760,10,1048576,2020-06-12 20:41:40,13,,,,0.11
15,Rechunk ,10485760,10,1048576,2020-06-12 20:41:40,14,,,,5.58
16,CpuDecompress ,10485760,10,1048576,2020-06-12 20:41:34,15,,,,0.04
17,ReadTable ,10485760,10,1048576,2020-06-12 20:41:34,16,832MB,,public.ontime,0.55
Using the SHOW_NODE_INFO
Command
The SHOW_NODE_INFO command returns a snapshot of the current query plan, similar to EXPLAIN ANALYZE
from other databases.
The SHOW_NODE_INFO result, just like the periodically-logged execution plans described above, are an at-the-moment
view of the compiler’s execution plan and runtime statistics for the specified statement.
To inspect a currently running statement, execute the show_node_info
utility function in a SQL client like sqream sql, the SQream Studio Editor, or any other third party SQL terminal.
In this example, we inspect a statement with statement ID of 176. The command looks like this:
SELECT
SHOW_NODE_INFO(176);
stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time | parent_node_id | read | write | comment | timeSum
--------+---------+--------------------+------+--------+-------------------+---------------------+----------------+------+-------+------------+--------
176 | 1 | PushToNetworkQueue | 1 | 1 | 1 | 2019-12-25 23:53:13 | -1 | | | | 0.0025
176 | 2 | Rechunk | 1 | 1 | 1 | 2019-12-25 23:53:13 | 1 | | | | 0
176 | 3 | GpuToCpu | 1 | 1 | 1 | 2019-12-25 23:53:13 | 2 | | | | 0
176 | 4 | ReorderInput | 1 | 1 | 1 | 2019-12-25 23:53:13 | 3 | | | | 0
176 | 5 | Filter | 1 | 1 | 1 | 2019-12-25 23:53:13 | 4 | | | | 0.0002
176 | 6 | GpuTransform | 457 | 1 | 457 | 2019-12-25 23:53:13 | 5 | | | | 0.0002
176 | 7 | GpuDecompress | 457 | 1 | 457 | 2019-12-25 23:53:13 | 6 | | | | 0
176 | 8 | CpuToGpu | 457 | 1 | 457 | 2019-12-25 23:53:13 | 7 | | | | 0.0003
176 | 9 | Rechunk | 457 | 1 | 457 | 2019-12-25 23:53:13 | 8 | | | | 0
176 | 10 | CpuDecompress | 457 | 1 | 457 | 2019-12-25 23:53:13 | 9 | | | | 0
176 | 11 | ReadTable | 457 | 1 | 457 | 2019-12-25 23:53:13 | 10 | 4MB | | public.nba | 0.0004
Alternatively, you may also retrieve the query execution plan output using SQreamDB Studio, and contact SQream Support.
Understanding the Query Execution Plan Output
Both SHOW_NODE_INFO and the logged execution plans represents the query plan as a graph hierarchy, with data separated into different columns.
Each row represents a single logical database operation, which is also called a node or chunk producer. A node reports
several metrics during query execution, such as how much data it has read and written, how many chunks and rows, and how much time has elapsed.
Consider the example show_node_info presented above. The source node with ID #11 (ReadTable
), has a parent node ID #10
(CpuDecompress
). If we were to draw this out in a graph, it’d look like this:

This graph explains how the query execution details are arranged in a logical order, from the bottom up.
The last node, also called the sink, has a parent node ID of -1, meaning it has no parent. This is typically a node that sends data over the network or into a table.
When using SHOW_NODE_INFO, a tabular representation of the currently running statement execution is presented. See the examples below to understand how the query execution plan is instrumental in identifying bottlenecks and optimizing long-running statements.
Information Presented in the Execution Plan
Commonly Seen Nodes
Column name |
Execution location |
Description |
---|---|---|
|
CPU |
Decompression operation, common for longer |
|
CPU |
A non-indexed nested loop join, performed on the CPU |
|
CPU |
A reduce process performed on the CPU, primarily with |
|
An operation that moves data to or from the GPU for processing |
|
|
CPU |
A transform operation performed on the CPU, usually a scalar function |
|
CPU |
Merges the results of GPU operations with a result set |
|
GPU |
Removes duplicate rows (usually as part of the |
|
CPU |
The merge operation of the |
|
GPU |
A filtering operation, such as a |
|
GPU |
Decompression operation |
|
GPU |
An operation to optimize part of the merger phases in the GPU |
|
GPU |
A transformation operation such as a type cast or scalar function |
|
CPU |
Validates external file paths for foreign data wrappers, expanding directories and GLOB patterns |
|
GPU |
A non-indexed nested loop join, performed on the GPU |
|
CPU |
A CSV parser, used after |
|
CPU |
Sends result sets to a client connected over the network |
|
CPU |
Reads external flat-files |
|
CPU |
Reads data from a standard table stored on disk |
|
Reorganize multiple small chunks into a full chunk. Commonly found after joins and when HIGH_SELECTIVITY is used |
|
|
GPU |
A reduction operation, such as a |
|
GPU |
A merge operation of a reduction operation, helps operate on larger-than-RAM data |
|
Change the order of arguments in preparation for the next operation |
|
|
GPU |
Gathers additional columns for the result |
|
GPU |
Sort operation |
|
Take the first N rows from each chunk, to optimize |
|
|
Limits the input size, when used with |
|
|
CPU |
Executes a user defined function |
|
Combines two sources of data when |
|
|
GPU |
Executes a non-ranking window function |
|
GPU |
Executes a ranking window function |
|
CPU |
Writes the result set to a standard table stored on disk |
Tip
The full list of nodes appears in the Node types table, as part of the SHOW_NODE_INFO reference.
Examples
In general, looking at the top three longest running nodes (as is detailed in the timeSum
column) can indicate the biggest bottlenecks.
In the following examples you will learn how to identify and solve some common issues.
Spooling to Disk
When there is not enough RAM to process a statement, SQreamDB will spill over data to the temp
folder in the storage disk.
While this ensures that a statement can always finish processing, it can slow down the processing significantly.
It’s worth identifying these statements, to figure out if the cluster is configured correctly, as well as potentially reduce
the statement size.
You can identify a statement that spools to disk by looking at the write
column in the execution details.
A node that spools will have a value, shown in megabytes in the write
column.
Common nodes that write spools include Join
or LoopJoin
.
Identifying the Offending Nodes
Run a query.
For example, a query from the TPC-H benchmark:
SELECT o_year, SUM( CASE WHEN nation = 'BRAZIL' THEN volume ELSE 0 END ) / SUM(volume) AS mkt_share FROM ( SELECT datepart(YEAR, o_orderdate) AS o_year, l_extendedprice * (1 - l_discount / 100.0) AS volume, n2.n_name AS nation FROM lineitem JOIN part ON p_partkey = CAST (l_partkey AS INT) JOIN orders ON l_orderkey = o_orderkey JOIN customer ON o_custkey = c_custkey JOIN nation n1 ON c_nationkey = n1.n_nationkey JOIN region ON n1.n_regionkey = r_regionkey JOIN supplier ON s_suppkey = l_suppkey JOIN nation n2 ON s_nationkey = n2.n_nationkey WHERE o_orderdate BETWEEN '1995-01-01' AND '1996-12-31' ) AS all_nations GROUP BY o_year ORDER BY o_year;
Observe the execution information by using the foreign table, or use
show_node_info
This statement is made up of 199 nodes, starting from a
ReadTable
, and finishes by returning only 2 results to the client.The execution below has been shortened, but note the highlighted rows for
LoopJoin
:SELECT message FROM logs WHERE message_type_id = 200 LIMIT 1; message ----------------------------------------------------------------------------------------- SELECT o_year, SUM(CASE WHEN nation = 'BRAZIL' THEN volume ELSE 0 END) / SUM(volume) AS mkt_share : FROM (SELECT datepart(YEAR,o_orderdate) AS o_year, : l_extendedprice*(1 - l_discount / 100.0) AS volume, : n2.n_name AS nation : FROM lineitem : JOIN part ON p_partkey = CAST (l_partkey AS INT) : JOIN orders ON l_orderkey = o_orderkey : JOIN customer ON o_custkey = c_custkey : JOIN nation n1 ON c_nationkey = n1.n_nationkey : JOIN region ON n1.n_regionkey = r_regionkey : JOIN supplier ON s_suppkey = l_suppkey : JOIN nation n2 ON s_nationkey = n2.n_nationkey : WHERE o_orderdate BETWEEN '1995-01-01' AND '1996-12-31') AS all_nations : GROUP BY o_year : ORDER BY o_year : 1,PushToNetworkQueue ,2,1,2,2020-09-04 18:32:50,-1,,,,0.27 : 2,Rechunk ,2,1,2,2020-09-04 18:32:50,1,,,,0.00 : 3,SortMerge ,2,1,2,2020-09-04 18:32:49,2,,,,0.00 : 4,GpuToCpu ,2,1,2,2020-09-04 18:32:49,3,,,,0.00 : 5,Sort ,2,1,2,2020-09-04 18:32:49,4,,,,0.00 : 6,ReorderInput ,2,1,2,2020-09-04 18:32:49,5,,,,0.00 : 7,GpuTransform ,2,1,2,2020-09-04 18:32:49,6,,,,0.00 : 8,CpuToGpu ,2,1,2,2020-09-04 18:32:49,7,,,,0.00 : 9,Rechunk ,2,1,2,2020-09-04 18:32:49,8,,,,0.00 : 10,ReduceMerge ,2,1,2,2020-09-04 18:32:49,9,,,,0.03 : 11,GpuToCpu ,6,3,2,2020-09-04 18:32:49,10,,,,0.00 : 12,Reduce ,6,3,2,2020-09-04 18:32:49,11,,,,0.64 [...] : 49,LoopJoin ,182369485,7,26052783,2020-09-04 18:32:36,48,1915MB,1915MB,inner,4.94 [...] : 98,LoopJoin ,182369485,12,15197457,2020-09-04 18:32:16,97,2191MB,2191MB,inner,5.01 [...] : 124,LoopJoin ,182369485,8,22796185,2020-09-04 18:32:03,123,3064MB,3064MB,inner,6.73 [...] : 150,LoopJoin ,182369485,10,18236948,2020-09-04 18:31:47,149,12860MB,12860MB,inner,23.62 [...] : 199,ReadTable ,20000000,1,20000000,2020-09-04 18:30:33,198,0MB,,public.part,0.83
Because of the relatively low amount of RAM in the machine and because the data set is rather large at around 10TB, SQreamDB needs to spool.
The total spool used by this query is around 20GB (1915MB + 2191MB + 3064MB + 12860MB).
Common Solutions for Reducing Spool
Increase the amount of spool memory available for the workers, as a proportion of the maximum statement memory. When the amount of spool memory is increased, SQreamDB may not need to write to disk.
This setting is called
spoolMemoryGB
. Refer to the configuration guide.Reduce the amount of workers per host, and increase the amount of spool available to the (now reduced amount of) active workers. This may reduce the amount of concurrent statements, but will improve performance for heavy statements.
Queries with Large Result Sets
When queries have large result sets, you may see a node called DeferredGather
.
This gathering occurs when the result set is assembled, in preparation for sending it to the client.
Identifying the Offending Nodes
Run a query.
For example, a modified query from the TPC-H benchmark:
SELECT s.*, l.*, r.*, n1.*, n2.*, p.*, o.*, c.* FROM lineitem l JOIN part p ON p_partkey = CAST (l_partkey AS INT) JOIN orders o ON l_orderkey = o_orderkey JOIN customer c ON o_custkey = c_custkey JOIN nation n1 ON c_nationkey = n1.n_nationkey JOIN region r ON n1.n_regionkey = r_regionkey JOIN supplier s ON s_suppkey = l_suppkey JOIN nation n2 ON s_nationkey = n2.n_nationkey WHERE r_name = 'AMERICA' AND o_orderdate BETWEEN '1995-01-01' AND '1996-12-31' AND high_selectivity(p_type = 'ECONOMY BURNISHED NICKEL');
Observe the execution information by using the foreign table, or use
show_node_info
This statement is made up of 221 nodes, containing 8
ReadTable
nodes, and finishes by returning billions of results to the client.The execution below has been shortened, but note the highlighted rows for
DeferredGather
:SELECT show_node_info(494); stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time | parent_node_id | read | write | comment | timeSum --------+---------+----------------------+-----------+--------+-------------------+---------------------+----------------+---------+-------+-----------------+-------- 494 | 1 | PushToNetworkQueue | 242615 | 1 | 242615 | 2020-09-04 19:07:55 | -1 | | | | 0.36 494 | 2 | Rechunk | 242615 | 1 | 242615 | 2020-09-04 19:07:55 | 1 | | | | 0 494 | 3 | ReorderInput | 242615 | 1 | 242615 | 2020-09-04 19:07:55 | 2 | | | | 0 494 | 4 | DeferredGather | 242615 | 1 | 242615 | 2020-09-04 19:07:55 | 3 | | | | 0.16 [...] 494 | 166 | DeferredGather | 3998730 | 39 | 102531 | 2020-09-04 19:07:47 | 165 | | | | 21.75 [...] 494 | 194 | DeferredGather | 133241 | 20 | 6662 | 2020-09-04 19:07:03 | 193 | | | | 0.41 [...] 494 | 221 | ReadTable | 20000000 | 20 | 1000000 | 2020-09-04 19:07:01 | 220 | 20MB | | public.part | 0.1
When you see
DeferredGather
operations taking more than a few seconds, that’s a sign that you’re selecting too much data. In this case, the DeferredGather with node ID 166 took over 21 seconds.Modify the statement to see the difference Altering the select clause to be more restrictive will reduce the deferred gather time back to a few milliseconds.
SELECT DATEPART(year, o_orderdate) AS o_year, l_extendedprice * (1 - l_discount / 100.0) as volume, n2.n_name as nation FROM ...
Common Solutions for Reducing Gather Time
Reduce the effect of the preparation time. Avoid selecting unnecessary columns (
SELECT * FROM...
), or reduce the result set size by using more filters.
Inefficient Filtering
When running statements, SQreamDB tries to avoid reading data that is not needed for the statement by skipping chunks. If statements do not include efficient filtering, SQreamDB will read a lot of data off disk. In some cases, you need the data and there’s nothing to do about it. However, if most of it gets pruned further down the line, it may be efficient to skip reading the data altogether by using the metadata.
Identifying the Situation
We consider the filtering to be inefficient when the Filter
node shows that the number of rows processed is less
than a third of the rows passed into it by the ReadTable
node.
For example:
#.
Run a query.
In this example, we execute a modified query from the TPC-H benchmark. Our
lineitem
table contains 600,037,902 rows.SELECT o_year, SUM( CASE WHEN nation = 'BRAZIL' THEN volume ELSE 0 END ) / SUM(volume) AS mkt_share FROM ( SELECT datepart(YEAR, o_orderdate) AS o_year, l_extendedprice * (1 - l_discount / 100.0) AS volume, n2.n_name AS nation FROM lineitem JOIN part ON p_partkey = CAST (l_partkey AS INT) JOIN orders ON l_orderkey = o_orderkey JOIN customer ON o_custkey = c_custkey JOIN nation n1 ON c_nationkey = n1.n_nationkey JOIN region ON n1.n_regionkey = r_regionkey JOIN supplier ON s_suppkey = l_suppkey JOIN nation n2 ON s_nationkey = n2.n_nationkey WHERE r_name = 'AMERICA' AND lineitem.l_quantity = 3 AND o_orderdate BETWEEN '1995-01-01' AND '1996-12-31' AND high_selectivity(p_type = 'ECONOMY BURNISHED NICKEL') ) AS all_nations GROUP BY o_year ORDER BY o_year;
Observe the execution information by using the foreign table, or use
show_node_info
The execution below has been shortened, but note the highlighted rows for
ReadTable
andFilter
:1SELECT show_node_info(559); 2stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time | parent_node_id | read | write | comment | timeSum 3--------+---------+----------------------+-----------+--------+-------------------+---------------------+----------------+--------+-------+-----------------+-------- 4 559 | 1 | PushToNetworkQueue | 2 | 1 | 2 | 2020-09-07 11:12:01 | -1 | | | | 0.28 5 559 | 2 | Rechunk | 2 | 1 | 2 | 2020-09-07 11:12:01 | 1 | | | | 0 6 559 | 3 | SortMerge | 2 | 1 | 2 | 2020-09-07 11:12:01 | 2 | | | | 0 7 559 | 4 | GpuToCpu | 2 | 1 | 2 | 2020-09-07 11:12:01 | 3 | | | | 0 8[...] 9 559 | 189 | Filter | 12007447 | 12 | 1000620 | 2020-09-07 11:12:00 | 188 | | | | 0.3 10 559 | 190 | GpuTransform | 600037902 | 12 | 50003158 | 2020-09-07 11:12:00 | 189 | | | | 0.02 11 559 | 191 | GpuDecompress | 600037902 | 12 | 50003158 | 2020-09-07 11:12:00 | 190 | | | | 0.16 12 559 | 192 | GpuTransform | 600037902 | 12 | 50003158 | 2020-09-07 11:12:00 | 191 | | | | 0.02 13 559 | 193 | CpuToGpu | 600037902 | 12 | 50003158 | 2020-09-07 11:12:00 | 192 | | | | 1.47 14 559 | 194 | ReorderInput | 600037902 | 12 | 50003158 | 2020-09-07 11:12:00 | 193 | | | | 0 15 559 | 195 | Rechunk | 600037902 | 12 | 50003158 | 2020-09-07 11:12:00 | 194 | | | | 0 16 559 | 196 | CpuDecompress | 600037902 | 12 | 50003158 | 2020-09-07 11:12:00 | 195 | | | | 0 17 559 | 197 | ReadTable | 600037902 | 12 | 50003158 | 2020-09-07 11:12:00 | 196 | 7587MB | | public.lineitem | 0.1 18[...] 19 559 | 208 | Filter | 133241 | 20 | 6662 | 2020-09-07 11:11:57 | 207 | | | | 0.01 20 559 | 209 | GpuTransform | 20000000 | 20 | 1000000 | 2020-09-07 11:11:57 | 208 | | | | 0.02 21 559 | 210 | GpuDecompress | 20000000 | 20 | 1000000 | 2020-09-07 11:11:57 | 209 | | | | 0.03 22 559 | 211 | GpuTransform | 20000000 | 20 | 1000000 | 2020-09-07 11:11:57 | 210 | | | | 0 23 559 | 212 | CpuToGpu | 20000000 | 20 | 1000000 | 2020-09-07 11:11:57 | 211 | | | | 0.01 24 559 | 213 | ReorderInput | 20000000 | 20 | 1000000 | 2020-09-07 11:11:57 | 212 | | | | 0 25 559 | 214 | Rechunk | 20000000 | 20 | 1000000 | 2020-09-07 11:11:57 | 213 | | | | 0 26 559 | 215 | CpuDecompress | 20000000 | 20 | 1000000 | 2020-09-07 11:11:57 | 214 | | | | 0 27 559 | 216 | ReadTable | 20000000 | 20 | 1000000 | 2020-09-07 11:11:57 | 215 | 20MB | | public.part | 0
The
Filter
on line 9 has processed 12,007,447 rows, but the output ofReadTable
onpublic.lineitem
on line 17 was 600,037,902 rows. This means that it has filtered out 98% (\(1 - \dfrac{600037902}{12007447} = 98\%\)) of the data, but the entire table was read.The
Filter
on line 19 has processed 133,000 rows, but the output ofReadTable
onpublic.part
on line 27 was 20,000,000 rows. This means that it has filtered out >99% (\(1 - \dfrac{133241}{20000000} = 99.4\%\)) of the data, but the entire table was read. However, this table is small enough that we can ignore it.
Modify the statement to see the difference Altering the statement to have a
WHERE
condition on the clusteredl_orderkey
column of thelineitem
table will help SQreamDB skip reading the data.SELECT o_year, SUM(CASE WHEN nation = 'BRAZIL' THEN volume ELSE 0 END) / SUM(volume) AS mkt_share FROM (SELECT datepart(YEAR,o_orderdate) AS o_year, l_extendedprice*(1 - l_discount / 100.0) AS volume, n2.n_name AS nation FROM lineitem JOIN part ON p_partkey = CAST (l_partkey AS INT) JOIN orders ON l_orderkey = o_orderkey JOIN customer ON o_custkey = c_custkey JOIN nation n1 ON c_nationkey = n1.n_nationkey JOIN region ON n1.n_regionkey = r_regionkey JOIN supplier ON s_suppkey = l_suppkey JOIN nation n2 ON s_nationkey = n2.n_nationkey WHERE r_name = 'AMERICA' AND lineitem.l_orderkey > 4500000 AND o_orderdate BETWEEN '1995-01-01' AND '1996-12-31' AND high_selectivity(p_type = 'ECONOMY BURNISHED NICKEL')) AS all_nations GROUP BY o_year ORDER BY o_year;
1SELECT show_node_info(586); 2stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time | parent_node_id | read | write | comment | timeSum 3--------+---------+----------------------+-----------+--------+-------------------+---------------------+----------------+--------+-------+-----------------+-------- 4[...] 5 586 | 190 | Filter | 494621593 | 8 | 61827699 | 2020-09-07 13:20:45 | 189 | | | | 0.39 6 586 | 191 | GpuTransform | 494927872 | 8 | 61865984 | 2020-09-07 13:20:44 | 190 | | | | 0.03 7 586 | 192 | GpuDecompress | 494927872 | 8 | 61865984 | 2020-09-07 13:20:44 | 191 | | | | 0.26 8 586 | 193 | GpuTransform | 494927872 | 8 | 61865984 | 2020-09-07 13:20:44 | 192 | | | | 0.01 9 586 | 194 | CpuToGpu | 494927872 | 8 | 61865984 | 2020-09-07 13:20:44 | 193 | | | | 1.86 10 586 | 195 | ReorderInput | 494927872 | 8 | 61865984 | 2020-09-07 13:20:44 | 194 | | | | 0 11 586 | 196 | Rechunk | 494927872 | 8 | 61865984 | 2020-09-07 13:20:44 | 195 | | | | 0 12 586 | 197 | CpuDecompress | 494927872 | 8 | 61865984 | 2020-09-07 13:20:44 | 196 | | | | 0 13 586 | 198 | ReadTable | 494927872 | 8 | 61865984 | 2020-09-07 13:20:44 | 197 | 6595MB | | public.lineitem | 0.09 14[...]
In this example, the filter processed 494,621,593 rows, while the output of
ReadTable
onpublic.lineitem
was 494,927,872 rows. This means that it has filtered out all but 0.01% (\(1 - \dfrac{494621593}{494927872} = 0.01\%\)) of the data that was read.The metadata skipping has performed very well, and has pre-filtered the data for us by pruning unnecessary chunks.
Common Solutions for Improving Filtering
Use clustering keys and naturally ordered data in your filters.
Avoid full table scans when possible
Joins with text
Keys
Joins on long text keys do not perform as well as numeric data types or very short text keys.
Identifying the Situation
When a join is inefficient, you may note that a query spends a lot of time on the Join
node.
For example, consider these two table structures:
CREATE TABLE t_a
(
amt FLOAT NOT NULL,
i INT NOT NULL,
ts DATETIME NOT NULL,
country_code TEXT(3) NOT NULL,
flag TEXT(10) NOT NULL,
fk TEXT(50) NOT NULL
);
CREATE TABLE t_b
(
id TEXT(50) NOT NULL
prob FLOAT NOT NULL,
j INT NOT NULL,
);
Run a query.
In this example, we will join
t_a.fk
witht_b.id
, both of which areTEXT(50)
.SELECT AVG(t_b.j :: BIGINT), t_a.country_code FROM t_a JOIN t_b ON (t_a.fk = t_b.id) GROUP BY t_a.country_code
Observe the execution information by using the foreign table, or use
show_node_info
The execution below has been shortened, but note the highlighted rows for
Join
. TheJoin
node is by far the most time-consuming part of this statement - clocking in at 69.7 seconds joining 1.5 billion records.1SELECT show_node_info(5); 2stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time | parent_node_id | read | write | comment | timeSum 3--------+---------+----------------------+------------+--------+-------------------+---------------------+----------------+-------+-------+------------+-------- 4[...] 5 5 | 19 | GpuTransform | 1497366528 | 204 | 7340032 | 2020-09-08 18:29:03 | 18 | | | | 1.46 6 5 | 20 | ReorderInput | 1497366528 | 204 | 7340032 | 2020-09-08 18:29:03 | 19 | | | | 0 7 5 | 21 | ReorderInput | 1497366528 | 204 | 7340032 | 2020-09-08 18:29:03 | 20 | | | | 0 8 5 | 22 | Join | 1497366528 | 204 | 7340032 | 2020-09-08 18:29:03 | 21 | | | inner | 69.7 9 5 | 24 | AddSortedMinMaxMet.. | 6291456 | 1 | 6291456 | 2020-09-08 18:26:05 | 22 | | | | 0 10 5 | 25 | Sort | 6291456 | 1 | 6291456 | 2020-09-08 18:26:05 | 24 | | | | 2.06 11[...] 12 5 | 31 | ReadTable | 6291456 | 1 | 6291456 | 2020-09-08 18:26:03 | 30 | 235MB | | public.t_b | 0.02 13[...] 14 5 | 41 | CpuDecompress | 10000000 | 2 | 5000000 | 2020-09-08 18:26:09 | 40 | | | | 0 15 5 | 42 | ReadTable | 10000000 | 2 | 5000000 | 2020-09-08 18:26:09 | 41 | 14MB | | public.t_a | 0
Improving Query Performance
In general, try to avoid
TEXT
as a join key. As a rule of thumb,BIGINT
works best as a join key.Convert text values on-the-fly before running the query. For example, the CRC64 function takes a text input and returns a
BIGINT
hash.For example:
SELECT AVG(t_b.j::BIGINT), t_a.country_code FROM "public"."t_a" JOIN "public"."t_b" ON (CRC64(t_a.fk::TEXT) = CRC64(t_b.id::TEXT)) GROUP BY t_a.country_code;
The execution below has been shortened, but note the highlighted rows for
Join
. TheJoin
node went from taking nearly 70 seconds, to just 6.67 seconds for joining 1.5 billion records.1 SELECT show_node_info(6); 2 stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time | parent_node_id | read | write | comment | timeSum 3 --------+---------+----------------------+------------+--------+-------------------+---------------------+----------------+-------+-------+------------+-------- 4 [...] 5 6 | 19 | GpuTransform | 1497366528 | 85 | 17825792 | 2020-09-08 18:57:04 | 18 | | | | 1.48 6 6 | 20 | ReorderInput | 1497366528 | 85 | 17825792 | 2020-09-08 18:57:04 | 19 | | | | 0 7 6 | 21 | ReorderInput | 1497366528 | 85 | 17825792 | 2020-09-08 18:57:04 | 20 | | | | 0 8 6 | 22 | Join | 1497366528 | 85 | 17825792 | 2020-09-08 18:57:04 | 21 | | | inner | 6.67 9 6 | 24 | AddSortedMinMaxMet.. | 6291456 | 1 | 6291456 | 2020-09-08 18:55:12 | 22 | | | | 0 10 [...] 11 6 | 32 | ReadTable | 6291456 | 1 | 6291456 | 2020-09-08 18:55:12 | 31 | 235MB | | public.t_b | 0.02 12 [...] 13 6 | 43 | CpuDecompress | 10000000 | 2 | 5000000 | 2020-09-08 18:55:13 | 42 | | | | 0 14 6 | 44 | ReadTable | 10000000 | 2 | 5000000 | 2020-09-08 18:55:13 | 43 | 14MB | | public.t_a | 0
You can map some text values to numeric types by using a dimension table. Then, reconcile the values when you need them by joining the dimension table.
Sorting on big TEXT
fields
In general, SQreamDB automatically inserts a Sort
node which arranges the data prior to reductions and aggregations.
When running a GROUP BY
on large TEXT
fields, you may see nodes for Sort
and Reduce
taking a long time.
Identifying the Situation
When running a statement, inspect it with SHOW_NODE_INFO. If you see Sort
and Reduce
among
your top five longest running nodes, there is a potential issue.
For example:
#.
Run a query to test it out.
Our
t_inefficient
table contains 60,000,000 rows, and the structure is simple, but with an oversizedcountry_code
column:CREATE TABLE t_inefficient ( i INT NOT NULL, amt DOUBLE NOT NULL, ts DATETIME NOT NULL, country_code TEXT(100) NOT NULL, flag TEXT(10) NOT NULL, string_fk TEXT(50) NOT NULL );We will run a query, and inspect it’s execution details:
SELECT country_code, SUM(amt) FROM t_inefficient GROUP BY country_code; country_code | sum -------------+----------- VUT | 1195416012 GIB | 1195710372 TUR | 1195946178 [...]SELECT SHOW_NODE_INFO(30); stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time | parent_node_id | read | write | comment | timeSum --------+---------+--------------------+----------+--------+-------------------+---------------------+----------------+-------+-------+----------------------+-------- 30 | 1 | PushToNetworkQueue | 249 | 1 | 249 | 2020-09-10 16:17:10 | -1 | | | | 0.25 30 | 2 | Rechunk | 249 | 1 | 249 | 2020-09-10 16:17:10 | 1 | | | | 0 30 | 3 | ReduceMerge | 249 | 1 | 249 | 2020-09-10 16:17:10 | 2 | | | | 0.01 30 | 4 | GpuToCpu | 1508 | 15 | 100 | 2020-09-10 16:17:10 | 3 | | | | 0 30 | 5 | Reduce | 1508 | 15 | 100 | 2020-09-10 16:17:10 | 4 | | | | 7.23 30 | 6 | Sort | 60000000 | 15 | 4000000 | 2020-09-10 16:17:10 | 5 | | | | 36.8 30 | 7 | GpuTransform | 60000000 | 15 | 4000000 | 2020-09-10 16:17:10 | 6 | | | | 0.08 30 | 8 | GpuDecompress | 60000000 | 15 | 4000000 | 2020-09-10 16:17:10 | 7 | | | | 2.01 30 | 9 | CpuToGpu | 60000000 | 15 | 4000000 | 2020-09-10 16:17:10 | 8 | | | | 0.16 30 | 10 | Rechunk | 60000000 | 15 | 4000000 | 2020-09-10 16:17:10 | 9 | | | | 0 30 | 11 | CpuDecompress | 60000000 | 15 | 4000000 | 2020-09-10 16:17:10 | 10 | | | | 0 30 | 12 | ReadTable | 60000000 | 15 | 4000000 | 2020-09-10 16:17:10 | 11 | 520MB | | public.t_inefficient | 0.05
We can look to see if there’s any shrinking we can do on the
GROUP BY
keySELECT MAX(LEN(country_code)) FROM t_inefficient; max --- 3
With a maximum string length of just 3 characters, our
TEXT(100)
is way oversized.We can recreate the table with a more restrictive
TEXT(3)
, and can examine the difference in performance:CREATE TABLE t_efficient AS SELECT i, amt, ts, country_code::TEXT(3) AS country_code, flag FROM t_inefficient; SELECT country_code, SUM(amt::bigint) FROM t_efficient GROUP BY country_code; country_code | sum -------------+----------- VUT | 1195416012 GIB | 1195710372 TUR | 1195946178 [...]
This time, the entire query took just 4.75 seconds, or just about 91% faster.
Improving Sort Performance on Text Keys
When using TEXT, ensure that the maximum length defined in the table structure is as small as necessary.
For example, if you’re storing phone numbers, don’t define the field as TEXT(255)
, as that affects sort performance.
You can run a query to get the maximum column length (e.g. MAX(LEN(a_column))
), and potentially modify the table structure.
High Selectivity Data
Selectivity is the ratio of cardinality to the number of records of a chunk. We define selectivity as \(\frac{\text{Distinct values}}{\text{Total number of records in a chunk}}\)
SQreamDB has a hint called HIGH_SELECTIVITY
, which is a function you can wrap a condition in.
The hint signals to SQreamDB that the result of the condition will be very sparse, and that it should attempt to rechunk
the results into fewer, fuller chunks.
.. note:
SQreamDB doesn't do this automatically because it adds a significant overhead on naturally ordered and
well-clustered data, which is the more common scenario.
Identifying the Situation
This is easily identifiable - when the amount of average of rows in a chunk is small, following a Filter
operation.
Consider this execution plan:
SELECT SHOW_NODE_INFO(30);
stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time | parent_node_id | read | write | comment | timeSum
--------+---------+-------------------+-----------+--------+-------------------+---------------------+----------------+-------+-------+------------+--------
[...]
30 | 38 | Filter | 18160 | 74 | 245 | 2020-09-10 12:17:09 | 37 | | | | 0.012
[...]
30 | 44 | ReadTable | 77000000 | 74 | 1040540 | 2020-09-10 12:17:09 | 43 | 277MB | | public.dim | 0.058
The table was read entirely - 77 million rows into 74 chunks. The filter node reduced the output to just 18,160 relevant rows, but they’re distributed across the original 74 chunks. All of these rows could fit in one single chunk, instead of spanning 74 rather sparse chunks.
Improving Performance with High Selectivity Hints
Use when there’s a
WHERE
condition on an unclustered column, and when you expect the filter to cut out more than 60% of the result set.Use when the data is uniformly distributed or random
Performance of unsorted data in joins
When data is not well-clustered or naturally ordered, a join operation can take a long time.
Identifying the Situation
When running a statement, inspect it with SHOW_NODE_INFO. If you see Join
and DeferredGather
among your
top five longest running nodes, there is a potential issue.
In this case, we’re also interested in the number of chunks produced by these nodes.
Consider this execution plan:
SELECT SHOW_NODE_INFO(30);
stmt_id | node_id | node_type | rows | chunks | avg_rows_in_chunk | time | parent_node_id | read | write | comment | timeSum
--------+---------+-------------------+-----------+--------+-------------------+---------------------+----------------+-------+-------+------------+--------
[...]
30 | 13 | ReorderInput | 181582598 | 70596 | 2572 | 2020-09-10 12:17:10 | 12 | | | | 4.681
30 | 14 | DeferredGather | 181582598 | 70596 | 2572 | 2020-09-10 12:17:10 | 13 | | | | 29.901
30 | 15 | ReorderInput | 181582598 | 70596 | 2572 | 2020-09-10 12:17:10 | 14 | | | | 3.053
30 | 16 | GpuToCpu | 181582598 | 70596 | 2572 | 2020-09-10 12:17:10 | 15 | | | | 5.798
30 | 17 | ReorderInput | 181582598 | 70596 | 2572 | 2020-09-10 12:17:10 | 16 | | | | 2.899
30 | 18 | ReorderInput | 181582598 | 70596 | 2572 | 2020-09-10 12:17:10 | 17 | | | | 3.695
30 | 19 | Join | 181582598 | 70596 | 2572 | 2020-09-10 12:17:10 | 18 | | | inner | 22.745
[...]
30 | 38 | Filter | 18160 | 74 | 245 | 2020-09-10 12:17:09 | 37 | | | | 0.012
[...]
30 | 44 | ReadTable | 77000000 | 74 | 1040540 | 2020-09-10 12:17:09 | 43 | 277MB | | public.dim | 0.058
Join
is the node that matches rows from both table relations.DeferredGather
gathers the required column chunks to decompress
Pay special attention to the volume of data removed by the Filter
node.
The table was read entirely - 77 million rows into 74 chunks.
The filter node reduced the output to just 18,160 relevant rows, but they’re distributed across the original 74 chunks.
All of these rows could fit in one single chunk, instead of spanning 74 rather sparse chunks.
Improving Join Performance when Data is Sparse
You can tell SQreamDB to reduce the amount of chunks involved, if you know that the filter is going to be quite
agressive by using the HIGH_SELECTIVITY hint described above.
This forces the compiler to rechunk the data into fewer chunks.
To tell SQreamDB to rechunk the data, wrap a condition (or several) in the HIGH_SELECTIVITY
hint:
-- Without the hint
SELECT *
FROM cdrs
WHERE
RequestReceiveTime BETWEEN '2018-01-01 00:00:00.000' AND '2018-08-31 23:59:59.999'
AND EnterpriseID=1150
AND MSISDN='9724871140341';
-- With the hint
SELECT *
FROM cdrs
WHERE
HIGH_SELECTIVITY(RequestReceiveTime BETWEEN '2018-01-01 00:00:00.000' AND '2018-08-31 23:59:59.999')
AND EnterpriseID=1150
AND MSISDN='9724871140341';
Manual Join Reordering
When joining multiple tables, you may wish to change the join order to join the smallest tables first.
Identifying the situation
When joining more than two tables, the Join
nodes will be the most time-consuming nodes.
Changing the Join Order
Always prefer to join the smallest tables first. .. note:
We consider small tables to be tables that only retain a small amount of rows after conditions
are applied. This bears no direct relation to the amount of total rows in the table.
Changing the join order can reduce the query runtime significantly. In the examples below, we reduce the time from 27.3 seconds to just 6.4 seconds.
-- This variant runs in 27.3 seconds
SELECT SUM(l_extendedprice / 100.0*(1 - l_discount / 100.0)) AS revenue,
c_nationkey
FROM lineitem --6B Rows, ~183GB
JOIN orders --1.5B Rows, ~55GB
ON l_orderkey = o_orderkey
JOIN customer --150M Rows, ~12GB
ON c_custkey = o_custkey
WHERE c_nationkey = 1
AND o_orderdate >= DATE '1993-01-01'
AND o_orderdate < '1994-01-01'
AND l_shipdate >= '1993-01-01'
AND l_shipdate <= dateadd(DAY,122,'1994-01-01')
GROUP BY c_nationkey
:caption: Modified query with improved join order
-- This variant runs in 6.4 seconds
SELECT SUM(l_extendedprice / 100.0*(1 - l_discount / 100.0)) AS revenue,
c_nationkey
FROM orders --1.5B Rows, ~55GB
JOIN customer --150M Rows, ~12GB
ON c_custkey = o_custkey
JOIN lineitem --6B Rows, ~183GB
ON l_orderkey = o_orderkey
WHERE c_nationkey = 1
AND o_orderdate >= DATE '1993-01-01'
AND o_orderdate < '1994-01-01'
AND l_shipdate >= '1993-01-01'
AND l_shipdate <= dateadd(DAY,122,'1994-01-01')
GROUP BY c_nationkey
Further Reading
See our Optimization and Best Practices guide for more information about query optimization and data loading considerations.
Security
SQream DB has some security features that you should be aware of to increase the security of your data.
Overview
An initial, unsecured installation of SQream DB can carry some risks:
Your data open to any client that can access an open node through an IP and port combination.
The initial administrator username and password, when unchanged, can let anyone log in.
Network connections to SQream DB aren’t encrypted.
To avoid these security risks, SQream DB provides authentication, authorizaiton, logging, and network encryption.
Read through the best practices guide to understand more.
Security best practices for SQream DB
Secure OS access
SQream DB often runs as a dedicated user on the host OS. This user is the file system owner of SQream DB data files.
Any user who logs in to the OS with this user can read or delete data from outside of SQream DB.
This user can also read any logs which may contain user login attempts.
Therefore, it is very important to secure the host OS and prevent unauthorized access.
System administrators should only log in to the host OS to perform maintenance tasks like upgrades. A database user should not log in using the same username in production environments.
Change the default SUPERUSER
To bootstrap SQream DB, a new install will always have one SUPERUSER
role, typically named sqream
.
After creating a second SUPERUSER
role, remove or change the default credentials to the default sqream
user.
No database user should ever use the default SUPERUSER
role in a production environment.
Create distinct user roles
Each user that signs in to a SQream DB cluster should have a distinct user role for several reasons:
For logging and auditing purposes. Each user that logs in to SQream DB can be identified.
For limiting permissions. Use groups and permissions to manage access. See our Access Control guide for more information.
Limit SUPERUSER
access
Limit users who have the SUPERUSER
role.
A superuser role bypasses all permissions checks. Only system administrators should have SUPERUSER
roles. See our Access Control guide for more information.
Password strength guidelines
System administrators should verify the passwords used are strong ones.
SQream DB stores passwords as salted SHA1 hashes in the system catalog so they are obscured and can’t be recovered. However, passwords may appear in server logs. Prevent access to server logs by securing OS access as described above.
Follow these recommendations to strengthen passwords:
Pick a password that’s easy to remember
At least 8 characters
Mix upper and lower case letters
Mix letters and numbers
Include non-alphanumeric characters (except
"
and'
)
Use TLS/SSL when possible
SQream DB’s protocol implements client/server TLS security (even though it is called SSL).
All SQream DB connectors and drivers support transport encryption. Ensure that each connection uses SSL and the correct access port for the SQream DB cluster:
The load balancer (
server_picker
) is often started with the secure port at an offset of 1 from the original port (e.g. port 3108 for the unsecured connection and port 3109 for the secured connection).A SQream DB worker is often started with the secure port enabled at an offset of 100 from the original port (e.g. port 5000 for the unsecured connection and port 5100 for the secured connection).
Refer to each client driver for instructions on enabling TLS/SSL.
Saved Queries
Using the save_query
command will both generate and save an execution plan. This allows you to save time when running frequently used complex queries.
Note that the saved execution plan is tightly coupled with the structure of its underlying tables, which means that if one or more of the objects mentioned in the query is modified, the saved query must be re-created.
How Saved Queries Work
Saved queries are compiled when they are created. When a saved query is run, this query plan is used instead of compiling a query plan at query time.
Parameter Support
Query parameters can be used as substitutes for constants expressions in queries.
Creating a Saved Query
A saved query is created using the SAVE_QUERY utility command.
Saving a Simple Query
SELECT SAVE_QUERY('select_all','SELECT * FROM nba');
executed
Saving a Parameterized Query
Parameterized queries, also known as prepared statements, enable the usage of parameters which may be replaced by actual values when executing the query. They are created and managed in application code, primarily to optimize query execution, enhance security, and allow for the reuse of query templates with different parameter values.
SELECT SAVE_QUERY('select_by_weight_and_team','SELECT * FROM nba WHERE Weight > ? AND Team = ?');
Executing Saved Queries
Executing a saved query requires calling it by it’s name in a EXECUTE_SAVED_QUERY statement. A saved query with no parameter is called without parameters.
SELECT EXECUTE_SAVED_QUERY('select_all');
Name | Team | Number | Position | Age | Height | Weight | College | Salary
-------------------------+------------------------+--------+----------+-----+--------+--------+-----------------------+---------
Avery Bradley | Boston Celtics | 0 | PG | 25 | 6-2 | 180 | Texas | 7730337
Jae Crowder | Boston Celtics | 99 | SF | 25 | 6-6 | 235 | Marquette | 6796117
John Holland | Boston Celtics | 30 | SG | 27 | 6-5 | 205 | Boston University |
R.J. Hunter | Boston Celtics | 28 | SG | 22 | 6-5 | 185 | Georgia State | 1148640
[...]
Executing a saved query with parameters requires specifying the parameters in the order they appear in the query:
SELECT EXECUTE_SAVED_QUERY('select_by_weight_and_team', 240, 'Toronto Raptors');
Name | Team | Number | Position | Age | Height | Weight | College | Salary
------------------+-----------------+--------+----------+-----+--------+--------+-------------+--------
Bismack Biyombo | Toronto Raptors | 8 | C | 23 | 6-9 | 245 | | 2814000
James Johnson | Toronto Raptors | 3 | PF | 29 | 6-9 | 250 | Wake Forest | 2500000
Jason Thompson | Toronto Raptors | 1 | PF | 29 | 6-11 | 250 | Rider | 245177
Jonas Valanciunas | Toronto Raptors | 17 | C | 24 | 7-0 | 255 | | 4660482
Listing Saved Queries
Saved queries are saved as a database objects. They can be listed in one of two ways:
Using the catalog:
SELECT * FROM sqream_catalog.savedqueries;
name | num_parameters
--------------------------+---------------
select_all | 0
select_by_weight | 1
select_by_weight_and_team | 2
Using the LIST_SAVED_QUERIES utility function:
SELECT LIST_SAVED_QUERIES();
saved_query
-------------------------
select_all
select_by_weight
select_by_weight_and_team
Dropping a Saved Query
When you’re done with a saved query, or would like to replace it with another, you can drop it with DROP_SAVED_QUERY:
SELECT DROP_SAVED_QUERY('select_all');
executed
SELECT DROP_SAVED_QUERY('select_by_weight_and_team');
executed
SELECT LIST_SAVED_QUERIES();
saved_query
-------------------------
select_by_weight
Optimization and Best Practices
This topic explains some best practices of working with SQream DB.
See also our Monitoring Query Performance guide for more information.
Table design
This section describes best practices and guidelines for designing tables.
Use date and datetime types for columns
When creating tables with dates or timestamps, using the purpose-built DATE
and DATETIME
types over integer types or TEXT
will bring performance and storage footprint improvements, and in many cases huge performance improvements (as well as data integrity benefits). SQream DB stores dates and datetimes very efficiently and can strongly optimize queries using these specific types.
Don’t flatten or denormalize data
SQream DB executes JOIN operations very effectively. It is almost always better to JOIN tables at query-time rather than flatten/denormalize your tables.
This will also reduce storage size and reduce row-lengths.
We highly suggest using INT
or BIGINT
as join keys, rather than a text/string type.
Convert foreign tables to native tables
SQream DB’s native storage is heavily optimized for analytic workloads. It is always faster for querying than other formats, even columnar ones such as Parquet. It also enables the use of additional metadata to help speed up queries, in some cases by many orders of magnitude.
You can improve the performance of all operations by converting foreign tables into native tables by using the CREATE TABLE AS syntax.
For example,
CREATE TABLE native_table AS SELECT * FROM external_table
The one situation when this wouldn’t be as useful is when data will be only queried once.
Use information about the column data to your advantage
Knowing the data types and their ranges can help design a better table.
Set NULL
or NOT NULL
when relevant
For example, if a value can’t be missing (or NULL
), specify a NOT NULL
constraint on the columns.
Not only does specifying NOT NULL
save on data storage, it lets the query compiler know that a column cannot have a NULL
value, which can improve query performance.
Sorting
Data sorting is an important factor in minimizing storage size and improving query performance.
Minimizing storage saves on physical resources and increases performance by reducing overall disk I/O. Prioritize the sorting of low-cardinality columns. This reduces the number of chunks and extents that SQream DB reads during query execution.
Where possible, sort columns with the lowest cardinality first. Avoid sorting
TEXT
columns with lengths exceeding 50 characters.For longer-running queries that run on a regular basis, performance can be improved by sorting data based on the
WHERE
andGROUP BY
parameters. Data can be sorted during insert by using external_tables or by using CREATE TABLE AS.
Query best practices
This section describes best practices for writing SQL queries.
Reduce data sets before joining tables
Reducing the input to a JOIN
clause can increase performance.
Some queries benefit from retreiving a reduced dataset as a subquery prior to a join.
For example,
SELECT store_name, SUM(amount)
FROM store_dim AS dim INNER JOIN store_fact AS fact ON dim.store_id=fact.store_id
WHERE p_date BETWEEN '2018-07-01' AND '2018-07-31'
GROUP BY 1;
Can be rewritten as
SELECT store_name, sum_amount
FROM store_dim AS dim INNER JOIN
(SELECT SUM(amount) AS sum_amount, store_id
FROM store_fact
WHERE p_date BETWEEN '2018-07-01' AND '2018-07-31'
group by 2) AS fact
ON dim.store_id=fact.store_id;
Prefer the ANSI JOIN
SQream DB prefers the ANSI JOIN syntax. In some cases, the ANSI JOIN performs better than the non-ANSI variety.
For example, this ANSI JOIN example will perform better:
SELECT p.name, s.name, c.name
FROM "Products" AS p
JOIN "Sales" AS s
ON p.product_id = s.sale_id
JOIN "Customers" as c
ON s.c_id = c.id AND c.id = 20301125;
This non-ANSI JOIN is supported, but not recommended:
SELECT p.name, s.name, c.name
FROM "Products" AS p, "Sales" AS s, "Customers" as c
WHERE p.product_id = s.sale_id
AND s.c_id = c.id
AND c.id = 20301125;
Use the high selectivity hint
Selectivity is the ratio of cardinality to the number of records of a chunk. We define selectivity as \(\frac{\text{Distinct values}}{\text{Total number of records in a chunk}}\)
SQream DB has a hint function called HIGH_SELECTIVITY
, which is a function you can wrap a condition in.
The hint signals to SQream DB that the result of the condition will be very sparse, and that it should attempt to rechunk the results into fewer, fuller chunks.
Use the high selectivity hint when you expect a predicate to filter out most values. For example, when the data is dispersed over lots of chunks (meaning that the data is not well-clustered).
For example,
SELECT store_name, SUM(amount) FROM store_dim
WHERE HIGH_SELECTIVITY(p_date = '2018-07-01')
GROUP BY 1;
This hint tells the query compiler that the WHERE
condition is expected to filter out more than 60% of values. It never affects the query results, but when used correctly can improve query performance.
Tip
The HIGH_SELECTIVITY()
hint function can only be used as part of the WHERE
clause. It can’t be used in equijoin conditions, cases, or in the select list.
Read more about identifying the scenarios for the high selectivity hint in our Monitoring query performance guide.
Cast smaller types to avoid overflow in aggregates
When using an INT
or smaller type, the SUM
and COUNT
operations return a value of the same type.
To avoid overflow on large results, cast the column up to a larger type.
For example
SELECT store_name, SUM(amount :: BIGINT) FROM store_dim
GROUP BY 1;
Prefer COUNT(*)
and COUNT
on non-nullable columns
SQream DB optimizes COUNT(*)
queries very strongly. This also applies to COUNT(column_name)
on non-nullable columns. Using COUNT(column_name)
on a nullable column will operate quickly, but much slower than the previous variations.
Return only required columns
Returning only the columns you need to client programs can improve overall query performance. This also reduces the overall result set, which can improve performance in third-party tools.
SQream is able to optimize out unneeded columns very strongly due to its columnar storage.
Use saved queries to reduce recurring compilation time
Saved Queries are compiled when they are created. The query plan is saved in SQream DB’s metadata for later re-use.
Saved query plans enable reduced compilation overhead, especially with very complex queries, such as queries with lots of values in an IN predicate.
When executed, the saved query plan is recalled and executed on the up-to-date data stored on disk.
See how to use saved queries in the saved queries guide.
Pre-filter to reduce JOIN complexity
Filter and reduce table sizes prior to joining on them
SELECT store_name,
SUM(amount)
FROM dimention dim
JOIN fact ON dim.store_id = fact.store_id
WHERE p_date BETWEEN '2019-07-01' AND '2019-07-31'
GROUP BY store_name;
Can be rewritten as:
SELECT store_name,
sum_amount
FROM dimention AS dim
INNER JOIN (SELECT SUM(amount) AS sum_amount,
store_id
FROM fact
WHERE p_date BETWEEN '2019-07-01' AND '2019-07-31'
GROUP BY store_id) AS fact ON dim.store_id = fact.store_id;
Data loading considerations
Allow and use natural sorting on data
Very often, tabular data is already naturally ordered along a dimension such as a timestamp or area.
This natural order is a major factor for query performance later on, as data that is naturally sorted can be more easily compressed and analyzed with SQream DB’s metadata collection.
For example, when data is sorted by timestamp, filtering on this timestamp is more effective than filtering on an unordered column.
Natural ordering can also be used for effective DELETE operations.
Further reading and monitoring query performance
Read our Monitoring Query Performance guide to learn how to use the built in monitoring utilities. The guide also gives concerete examples for improving query performance.
SQream Acceleration Studio 5.4.7
The SQream Acceleration Studio 5.4.7 is a web-based client for use with SQream. Studio provides users with all functionality available from the command line in an intuitive and easy-to-use format. This includes running statements, managing roles and permissions, and managing SQream clusters.
This section describes how to use the SQream Accleration Studio version 5.4.7:
Getting Started with SQream Acceleration Studio 5.4.7
Setting Up and Starting Studio
When starting Studio, it listens on the local machine on port 8080.
Logging In to Studio
To log in to SQream Studio:
Open a browser to the host on port 8080.
For example, if your machine IP address is
192.168.0.100
, insert the IP address into the browser as shown below:$ http://192.168.0.100:8080
Fill in your SQream DB login credentials. These are the same credentials used for sqream sql or JDBC.
When you sign in, the License Warning is displayed.
Monitoring Workers and Services from the Dashboard
The Dashboard is used for the following:
Monitoring system health.
Viewing, monitoring, and adding defined service queues.
Viewing and managing worker status and add workers.
The following is an image of the Dashboard:

You can only access the Dashboard if you signed in with a SUPERUSER
role.
The following is a brief description of the Dashboard panels:
No. |
Element |
Description |
---|---|---|
1 |
Used for viewing and monitoring the defined service queues. |
|
2 |
Monitors system health and shows each Sqreamd worker running in the cluster. |
|
3 |
Shows the remaining amount of days left on your license. |
Back to Monitoring Workers and Services from the Dashboard
Subscribing to Workers from the Services Panel
Services are used to categorize and associate (also known as subscribing) workers to particular services. The Service panel is used for viewing, monitoring, and adding defined service queues.
The following is a brief description of each pane:
No. |
Description |
---|---|
1 |
Adds a worker to the selected service. |
2 |
Shows the service name. |
3 |
Shows a trend graph of queued statements loaded over time. |
4 |
Adds a service. |
5 |
Shows the currently processed queries belonging to the service/total queries for that service in the system (including queued queries). |
Adding A Service
You can add a service by clicking + Add and defining the service name.
Note
If you do not associate a worker with the new service, it will not be created.
You can manage workers from the Workers panel. For more information about managing workers, see the following:
Managing Workers from the Workers Panel
From the Workers panel you can do the following:
Viewing Workers
The Worker panel shows each worker (sqreamd
) running in the cluster. Each worker has a status bar that represents the status over time. The status bar is divided into 20 equal segments, showing the most dominant activity in that segment.
From the Scale dropdown menu you can set the time scale of the displayed information You can hover over segments in the status bar to see the date and time corresponding to each activity type:
Idle – the worker is idle and available for statements.
Compiling – the worker is compiling a statement and is preparing for execution.
Executing – the worker is executing a statement after compilation.
Stopped – the worker was stopped (either deliberately or due to an error).
Waiting – the worker was waiting on an object locked by another worker.
Adding A Worker to A Service
You can add a worker to a service by clicking the add button.
Clicking the add button shows the selected service’s workers. You can add the selected worker to the service by clicking Add Worker. Adding a worker to a service does not break associations already made between that worker and other services.
Viewing A Worker’s Active Query Information
You can view a worker’s active query information by clicking Queries, which displays them in the selected service.
Each statement shows the query ID, status, service queue, elapsed time, execution time, and estimated completion status. In addition, each statement can be stopped or expanded to show its execution plan and progress. For more information on viewing a statement’s execution plan and progress, see Viewing a Worker’s Execution Plan below.
Viewing A Worker’s Host Utilization
While viewing a worker’s query information, clicking the down arrow expands to show the host resource utilization.
The graphs show the resource utilization trends over time, and the CPU memory and utilization and the GPU utilization values on the right. You can hover over the graph to see more information about the activity at any point on the graph.
Error notifications related to statements are displayed, and you can hover over them for more information about the error.
Viewing a Worker’s Execution Plan
Clicking the ellipsis in a service shows the following additional options:
Stop Query - stops the query.
Show Execution Plan - shows the execution plan as a table. The columns in the Show Execution Plan table can be sorted.
For more information on the current query plan, see SHOW_NODE_INFO. For more information on checking active sessions across the cluster, see SHOW_SERVER_STATUS.
Managing Worker Status
In some cases you may want to stop or restart workers for maintenance purposes. Each Worker line has a ⋮ menu used for stopping, starting, or restarting workers.
Starting or restarting workers terminates all queries related to that worker. When you stop a worker, its background turns gray.
License Information
The license information section shows the following:
The amount of time in days remaining on the license.
The license storage capacity.

Executing Statements and Running Queries from the Editor
The Editor is used for the following:
Selecting an active database and executing queries.
Performing statement-related operations and showing metadata.
Executing pre-defined queries.
Writing queries and statements and viewing query results.
The following is a brief description of the Editor panels:
No. |
Element |
Description |
---|---|---|
1 |
Used to select the active database you want to work on, limit the number of rows, save query, etc. |
|
2 |
Shows a hierarchy tree of databases, views, tables, and columns |
|
3 |
Used for writing queries and statements |
|
4 |
Shows query results and execution information. |
Executing Statements from the Toolbar
You can access the following from the Toolbar pane:
Database dropdown list - select a database that you want to run statements on.
Service dropdown list - select a service that you want to run statements on. The options in the service dropdown menu depend on the database you select from the Database dropdown list.
Execute - lets you set which statements to execute. The Execute button toggles between Execute and Stop, and can be used to stop an active statement before it completes:
Statements - executes the statement at the location of the cursor.
Selected - executes only the highlighted text. This mode should be used when executing subqueries or sections of large queries (as long as they are valid SQLs).
All - executes all statements in a selected tab.
Format SQL - Lets you reformat and reindent statements.
Download query - Lets you download query text to your computer.
Open query - Lets you upload query text from your computer.
Max Rows - By default, the Editor fetches only the first 10,000 rows. You can modify this number by selecting an option from the Max Rows dropdown list. Note that setting a higher number may slow down your browser if the result is very large. This number is limited to 100,000 results. To see a higher number, you can save the results in a file or a table using the CREATE TABLE AS command.
For more information on stopping active statements, see the STOP_STATEMENT command.
Back to Executing Statements and Running Queries from the Editor
Writing Statements and Queries from the Statement Panel
The multi-tabbed statement area is used for writing queries and statements, and is used in tandem with the toolbar. When writing and executing statements, you must first select a database from the Database dropdown menu in the toolbar. When you execute a statement, it passes through a series of statuses until completed. Knowing the status helps you with statement maintenance, and the statuses are shown in the Results panel.
The auto-complete feature assists you when writing statements by suggesting statement options.
The following table shows the statement statuses:
Status |
Description |
---|---|
Pending |
The statement is pending. |
In queue |
The statement is waiting for execution. |
Initializing |
The statement has entered execution checks. |
Executing |
The statement is executing. |
Statement stopped |
The statement has been stopped. |
You can add and name new tabs for each statement that you need to execute, and Studio preserves your created tabs when you switch between databases. You can add new tabs by clicking , which creates a new tab to the right with a default name of SQL and an increasing number. This helps you keep track of your statements.
You can also rename the default tab name by double-clicking it and typing a new name and write multiple statements in tandem in the same tab by separating them with semicolons (;
).If too many tabs to fit into the Statement Pane are open at the same time, the tab arrows are displayed. You can scroll through the tabs by clicking or
, and close tabs by clicking
. You can also close all tabs at once by clicking Close all located to the right of the tabs.
Back to Executing Statements and Running Queries from the Editor
Viewing Statement and Query Results from the Results Panel
The results panel shows statement and query results. By default, only the first 10,000 results are returned, although you can modify this from the studio_editor_toolbar, as described above. By default, executing several statements together opens a separate results tab for each statement. Executing statements together executes them serially, and any failed statement cancels all subsequent executions.

The following is a brief description of the Results panel views highlighted in the figure above:
Element |
Description |
---|---|
Lets you view search query results. |
|
Lets you analyze your query for troubleshooting and optimization purposes. |
|
Lets you see the SQL view. |
Back to Executing Statements and Running Queries from the Editor
Searching Query Results in the Results View
The Results view lets you view search query results.
From this view you can also do the following:
View the amount of time (in seconds) taken for a query to finish executing.
Switch and scroll between tabs.
Close all tabs at once.
Enable keeping tabs by selecting Keep tabs.
Sort column results.
Saving Results to the Clipboard
The Save results to clipboard function lets you save your results to the clipboard to paste into another text editor or into Excel for further analysis.
Saving Results to a Local File
The Save results to local file functions lets you save your search query results to a local file. Clicking Save results to local file downloads the contents of the Results panel to an Excel sheet. You can then use copy and paste this content into other editors as needed.
In the Results view you can also run parallel statements, as described in Running Parallel Statements below.
Running Parallel Statements
While Studio’s default functionality is to open a new tab for each executed statement, Studio supports running parallel statements in one statement tab. Running parallel statements requires using macros and is useful for advanced users.
The following shows the syntax for running parallel statements:
$ @@ parallel
$ $$
$ select 1;
$ select 2;
$ select 3;
$ $$
Back to Viewing Statement and Query Results from the Results Panel
Execution Details View
The Execution Details View section describes the following:
Overview
Clicking Execution Details View displays the Execution Tree, which is a chronological tree of processes that occurred to execute your queries. The purpose of the Execution Tree is to analyze all aspects of your query for troubleshooting and optimization purposes, such as resolving queries with an exceptionally long runtime.
Note
The Execution Details View button is enabled only when a query takes longer than five seconds.
From this screen you can scroll in, out, and around the execution tree with the mouse to analyze all aspects of your query. You can navigate around the execution tree by dragging or by using the mini-map in the bottom right corner.

You can also search for query data by pressing Ctrl+F or clicking the search icon in the search field in the top right corner and typing text.

Pressing Enter takes you directly to the next result matching your search criteria, and pressing Shift + Enter takes you directly to the previous result. You can also search next and previous results using the up and down arrows.
The nodes are color-coded based on the following:
Slow nodes - red
In progress nodes - yellow
Completed nodes - green
Pending nodes - white
Currently selected node - blue
Search result node - purple (in the mini-map)
The execution tree displays the same information as shown in the plain view in tree format.
The Execution Tree tracks each phase of your query in real time as a vertical tree of nodes. Each node refers to an operation that occurred on the GPU or CPU. When a phase is completed, the next branch begins to its right until the entire query is complete. Joins are displayed as two parallel branches merged together in a node called Join, as shown in the figure above. The nodes are connected by a line indicating the number of rows passed from one node to the next. The width of the line indicates the amount of rows on a logarithmic scale.
Each node displays a number displaying its node ID, its type, table name (if relevant), status, and runtime. The nodes are color-coded for easy identification. Green nodes indicate completed nodes, yellow indicates nodes in progress, and red indicates slowest nodes, typically joins, as shown below:

Viewing Query Statistics
The following statistical information is displayed in the top left corner, as shown in the figure above:
Query Statistics:
Elapsed - the total time taken for the query to complete.
Result rows - the amount of rows fetched.
Running nodes completion
Total query completion - the amount of the total execution tree that was executed (nodes marked green).
Slowest Nodes information is displayed in the top right corner in red text. Clicking the slowest node centers automatically on that node in the execution tree.
You can also view the following Node Statistics in the top right corner for each individual node by clicking a node:
Element |
Description |
---|---|
Node type |
Shows the node type. |
Status |
Shows the execution status. |
Time |
The total time taken to execute. |
Rows |
Shows the number of produced rows passed to the next node. |
Chunks |
Shows number of produced chunks. |
Average rows per chunk |
Shows the number of average rows per chunk. |
Table (for ReadTable and joins only) |
Shows the table name. |
Write (for joins only) |
Shows the total date size written to the disk. |
Read (for ReadTable and joins only) |
Shows the total data size read from the disk. |
Note that you can scroll the Node Statistics table. You can also download the execution plan table in .csv format by clicking the download arrow in the upper-right corner.
Using the Plain View
You can use the Plain View instead of viewing the execution tree by clicking Plain View in the top right corner. The plain view displays the same information as shown in the execution tree in table format.
The plain view lets you view a query’s execution plan for monitoring purposes and highlights rows based on how long they ran relative to the entire query.
This can be seen in the timeSum column as follows:
Rows highlighted red - longest runtime
Rows highlighted orange - medium runtime
Rows highlighted yellow - shortest runtime
Back to Viewing Statement and Query Results from the Results Panel
Viewing Wrapped Strings in the SQL View
The SQL View panel allows you to more easily view certain queries, such as a long string that appears on one line. The SQL View makes it easier to see by wrapping it so that you can see the entire string at once. It also reformats and organizes query syntax entered in the Statement panel for more easily locating particular segments of your queries. The SQL View is identical to the Format SQL feature in the Toolbar, allowing you to retain your originally constructed query while viewing a more intuitively structured snapshot of it.
Back to Viewing Statement and Query Results from the Results Panel
Back to Executing Statements and Running Queries from the Editor
Viewing Logs
The Logs screen is used for viewing logs and includes the following elements:
Element |
Description |
---|---|
Lets you filter the data shown in the table. |
|
Shows basic query information logs, such as query number and the time the query was run. |
|
Shows basic session information logs, such as session ID and user name. |
|
Shows all system logs. |
|
Shows the total amount of log lines. |
Filtering Table Data
From the Logs tab, from the FILTERS area you can also apply the TIMESPAN, ONLY ERRORS, and additional filters (Add). The Timespan filter lets you select a timespan. The Only Errors toggle button lets you show all queries, or only queries that generated errors. The Add button lets you add additional filters to the data shown in the table. The Filter button applies the selected filter(s).
Other filters require you to select an item from a dropdown menu:
INFO
WARNING
ERROR
FATAL
SYSTEM
You can also export a record of all of your currently filtered logs in Excel format by clicking Download located above the Filter area.
Viewing Query Logs
The QUERIES log area shows basic query information, such as query number and the time the query was run. The number next to the title indicates the amount of queries that have been run.
From the Queries area you can see and sort by the following:
Query ID
Start time
Query
Compilation duration
Execution duration
Total duration
Details (execution details, error details, successful query details)
In the Queries table, you can click on the Statement ID and Query items to set them as your filters. In the Details column you can also access additional details by clicking one of the Details options for a more detailed explanation of the query.
Viewing Session Logs
The SESSIONS tab shows the sessions log table and is used for viewing activity that has occurred during your sessions. The number at the top indicates the amount of sessions that have occurred.
From here you can see and sort by the following:
Timestamp
Connection ID
Username
Client IP
Login (Success or Failed)
Duration (of session)
Configuration Changes
In the Sessions table, you can click on the Timestamp, Connection ID, and Username items to set them as your filters.
Viewing System Logs
The SYSTEM tab shows the system log table and is used for viewing all system logs. The number at the top indicates the amount of sessions that have occurred. Because system logs occur less frequently than queries and sessions, you may need to increase the filter timespan for the table to display any system logs.
From here you can see and sort by the following:
Timestamp
Log type
Message
In the Systems table, you can click on the Timestamp and Log type items to set them as your filters. In the Message column, you can also click on an item to show more information about the message.
Viewing All Log Lines
The LOG LINES tab is used for viewing the total amount of log lines in a table. From here users can view a more granular breakdown of log information collected by Studio. The other tabs (QUERIES, SESSIONS, and SYSTEM) show a filtered form of the raw log lines. For example, the QUERIES tab shows an aggregation of several log lines.
From here you can see and sort by the following:
Timestamp
Message level
Worker hostname
Worker port
Connection ID
Database name
User name
Statement ID
In the LOG LINES table, you can click on any of the items to set them as your filters.
Creating, Assigning, and Managing Roles and Permissions
The Creating, Assigning, and Managing Roles and Permissions describes the following:
Overview
In the Roles area you can create and assign roles and manage user permissions.
The Type column displays one of the following assigned role types:
Role Type |
Description |
---|---|
Groups |
Roles with no users. |
Enabled users |
Users with log-in permissions and a password. |
Disabled users |
Users with log-in permissions and with a disabled password. An admin may disable a user’s password permissions to temporary disable access to the system. |
Note
If you disable a password, when you enable it you have to create a new one.
Back to Creating, Assigning, and Managing Roles and Permissions
Viewing Information About a Role
Clicking a role in the roles table displays the following information:
Parent Roles - displays the parent roles of the selected role. Roles inherit all roles assigned to the parent.
Members - displays all members that the role has been assigned to. The arrow indicates the roles that the role has inherited. Hovering over a member displays the roles that the role is inherited from.
Permissions - displays the role’s permissions. The arrow indicates the permissions that the role has inherited. Hovering over a permission displays the roles that the permission is inherited from.
Back to Creating, Assigning, and Managing Roles and Permissions
Creating a New Role
You can create a new role by clicking New Role.
An admin creates a user by granting login permissions and a password to a role. Each role is defined by a set of permissions. An admin can also group several roles together to form a group to manage them simultaneously. For example, permissions can be granted to or revoked on a group level.
Clicking New Role lets you do the following:
Add and assign a role name (required)
Enable or disable log-in permissions for the role
Set a password
Assign or delete parent roles
Add or delete permissions
Grant the selected user with superuser permissions
From the New Role panel you view directly and indirectly (or inherited) granted permissions. Disabled permissions have no connect permissions for the referenced database and are displayed in gray text. You can add or remove permissions from the Add permissions field. From the New Role panel you can also search and scroll through the permissions. In the Search field you can use the and operator to search for strings that fulfill multiple criteria.
When adding a new role, you must select the Enable login for this role and Has password check boxes.
Back to Creating, Assigning, and Managing Roles and Permissions
Editing a Role
Once you’ve created a role, clicking the Edit Role button lets you do the following:
Edit role name
Enable or disable log-in permissions
Set a password
Assign or delete parent roles
Assign a role administrator permissions
Add or delete permissions
Grant the selected user with superuser permissions
From the Edit Role panel you view directly and indirectly (or inherited) granted permissions. Disabled permissions have no connect permissions for the referenced database and are displayed in gray text. You can add or remove permissions from the Add permissions field. From the Edit Role panel you can also search and scroll through the permissions. In the Search field you can use the and operator to search for strings that fulfill multiple criteria.
Back to Creating, Assigning, and Managing Roles and Permissions
Deleting a Role
Clicking the delete icon displays a confirmation message with the amount of users and groups that will be impacted by deleting the role.
Back to Creating, Assigning, and Managing Roles and Permissions
Configuring Your Instance of SQreams
The Configuration section lets you edit parameters from one centralized location. While you can edit these parameters from the worker configuration file (config.json) or from your CLI, you can also modify them in Studio in an easy-to-use format.
Configuring your instance of SQream in Studio is session-based, which enables you to edit parameters per session on your own device. Because session-based configurations are not persistent and are deleted when your session ends, you can edit your required parameters while avoiding conflicts between parameters edited on different devices at different points in time.
Editing Your Parameters
When configuring your instance of SQream in Studio you can edit parameters for the Generic and Admin parameters only.
Studio includes two types of parameters: toggle switches, such as flipJoinOrder, and text fields, such as logSysLevel. After editing a parameter, you can reset each one to its previous value or to its default value individually, or revert all parameters to their default setting simultaneously. Note that you must click Save to save your configurations.
You can hover over the information icon located on each parameter to read a short description of its behavior.
Exporting and Importing Configuration Files
You can also export and import your configuration settings into a .json file. This allows you to easily edit your parameters and to share this file with other users if required.
For more information about configuring your instance of SQream, see Configuration.
System Architecture
The Internals and Architecture and Filesystem and Usage guides are walk-throughs for end-users, database administrators, and system architects who wish to get familiarized with the SQreamDB system and its unique capabilities.

Internals and Architecture
Get to know the SQreamDB key functions and system architecture components, best practices, customization possibilities, and optimizations.
SQreamDB leverages GPU acceleration as an essential component of its core database operations, significantly enhancing columnar data processing. This integral GPU utilization isn’t an optional feature but is fundamental to a wide range of data tasks such as GROUP BY
, scalar functions, JOIN
, ORDER BY
, and more. This approach harnesses the inherent parallelism of GPUs, effectively employing a single instruction to process multiple values, akin to the Single-Instruction, Multiple Data (SIMD) concept, tailored for high-throughput operations.

Concurrency and Admission Control
The SQreamDB execution engine employs thread workers and message passing for its foundation. This threading approach enables the concurrent execution of diverse operations, seamlessly integrating IO and GPU tasks with CPU operations while boosting the performance of CPU-intensive tasks.
Learn more about Concurrency and Scaling in SQream DB.
Statement Compiler
The Statement Compiler, developed using Haskell, accepts SQL text and generates optimized statement execution plans.
Building Blocks (GPU Workers)
In SQreamDB, the main workload is carried out by specialized C++/CUDA building blocks, also known as Workers, which intentionally lack inherent intelligence and require precise instructions for operation. Effectively assembling these components relies largely on the capabilities of the statement compiler.
Storage Layer
The storage is split into the metadata layer and an append-only data layer.
Metadata Layer
Utilizing RocksDB key/value data store, the metadata layer incorporates features such as snapshots and atomic writes within the transaction system, while working in conjunction with the append-only bulk data layer to maintain overall data consistency.
Bulk Data Layer Optimization
SQreamDB harnesses the power of its columnar storage architecture within the bulk data layer for performance optimization. This layer employs IO-optimized extents containing compression-enabled CPU and GPU-efficient chunks. Even during small insert operations, SQreamDB maintains efficiency by generating less optimized chunks and extents as needed. This is achieved through background transactional reorganization, such as DeferredGather
, that doesn’t disrupt Data Manipulation Language (DML) operations. Deferred Gather optimizes GPU processing by selectively gathering only the necessary columns after GPU execution, effectively conserving memory and enhancing query performance.
The system initially writes small chunks via small inserts and subsequently reorganizes them, facilitating swift medium-sized insert transactions and rapid queries. This optimization strategy, coupled with SQreamDB’s columnar storage, ensures peak performance across diverse data processing tasks.
Transactions
SQreamDB has serializable (auto commit) transactions, with these features:
Serializable, with any kind of statement
Run multiple SELECT queries concurrently with anything
Run multiple inserts to the same table at the same time
Cannot run multiple statements in a single transaction
Other operations such as DELETE, TRUNCATE, and DDL use coarse-grained exclusive locking.
Filesystem and Usage
SQreamDB writes and reads data from disk.
The SQreamDB storage directory, sometimes referred to as a storage cluster is a collection of database objects, metadata database, and logs.
Each SQreamDB worker and the metadata server must have access to the storage cluster in order to function properly.
Directory organization

The cluster root is the directory in which all data for SQreamDB is stored.
databases
The databases directory houses all of the actual data in tables and columns.
Each database is stored as its own directory. Each table is stored under its respective database, and columns are stored in their respective table.

In the example above, the database named retail
contains a table directory with a directory named 23
.
Tip
To find table IDs, use a catalog query:
master=> SELECT table_name, table_id FROM sqream_catalog.tables WHERE table_name = 'customers';
table_name | table_id
-----------+---------
customers | 23
Each table directory contains a directory for each physical column. An SQL column may be built up of several physical columns (e.g. if the data type is nullable).
Tip
To find column IDs, use a catalog query:
master=> SELECT column_id, column_name FROM sqream_catalog.columns WHERE table_id=23;
column_id | column_name
----------+------------
0 | name@null
1 | name@val
2 | age@null
3 | age@val
4 | email@null
5 | email@val
Each column directory will contain extents, which are collections of chunks.

metadata
or rocksdb
SQreamDB’s metadata is an embedded key-value store, based on RocksDB. RocksDB helps SQreamDB ensure efficient storage for keys, handle atomic writes, snapshots, durability, and automatic recovery.
The metadata is where all database objects are stored, including roles, permissions, database and table structures, chunk mappings, and more.
temp
The temp
directory is where SQreamDB writes temporary data.
The directory to which SQreamDB writes temporary data can be changed to any other directory on the filesystem. SQreamDB recommends remapping this directory to a fast local storage to get better performance when executing intensive larger-than-RAM operations like sorting. SQreamDB recommends an SSD or NVMe drive, in mirrored RAID 1 configuration.
If desired, the temp
folder can be redirected to a local disk for improved performance, by setting the tempPath
setting in the legacy configuration file.
logs
The logs directory contains logs produced by SQreamDB.
See more about the logs in the Logging guide.
Sizing
Concurrency and Scaling in SQreamDB
A SQreamDB cluster can execute one statement per worker process while also supporting the concurrent operation of multiple workers. Utility functions with minimal resource requirements, such as SHOW_SERVER_STATUS, SHOW_LOCKS, and SHOW_NODE_INFO will be executed regardless of the workload.
Minimum Resource Required Per Worker:
Component |
CPU Cores |
RAM (GB) |
Local Storage (GB) |
---|---|---|---|
Worker |
8 |
128 |
10 |
Metadata Server |
16 cores per 100 Workers |
20 GB RAM for every 1 trillion rows |
10 |
SqreamDB Acceleration Studio |
16 |
16 |
50 |
Server Picker |
1 |
2 |
Lightweight queries, such as COPY TO and Clean-Up require 64 RAM (GB).
Maximum Workers Per GPU:
GPU |
Workers |
---|---|
NVIDIA Turing T4 (16GB) |
1 |
NVIDIA Volta V100 (32GB) |
2 |
NVIDIA Ampere A100 (40GB) |
3 |
NVIDIA Ampere A100 (80GB) |
6 |
NVIDIA Hopper H100 (80GB) |
6 |
L40S Ada Lovelace (48GB) |
4 |
Tip
Your GPU is not on the list? Visit SQreamDB Support for additional information.
Scaling When Data Sizes Grow
For many statements, SQreamDB scales linearly when adding more storage and querying on large data sets. It uses optimized ‘brute force’ algorithms and implementations, which don’t suffer from sudden performance cliffs at larger data sizes.
Scaling When Queries Are Queuing
SQreamDB scales well by adding more workers, GPUs, and nodes to support more concurrent statements.
What To Do When Queries Are Slow
Adding more workers or GPUs does not boost the performance of a single statement or query.
To boost the performance of a single statement, start by examining the best practices and ensure the guidelines are followed.
Adding additional RAM to nodes, using more GPU memory, and faster CPUs or storage can also sometimes help.
Spooling Configuration
\(limitQueryMemoryGB=\frac{\text{Total RAM - Internal Operation - metadata Server - Server picker}}{\text{Number of Workers}}\)
\(spoolMemoryGB=limitQueryMemoryGB - 50GB\)
The limitQueryMemoryGB
flag is the total memory you’ve allocated for processing queries. In addition, the limitQueryMemoryGB
defines how much total system memory is used by each worker. Note that spoolMemoryGB
must bet set to less than the limitQueryMemoryGB
.
Example
Setting Spool Memory
The provided examples assume a configuration with 2T of RAM, 8 workers running on 2 A100(80GB) GPUs, with 200 GB allocated for Internal Operations, Metadata Server, Server Picker, and UI.
Configuring the limitQueryMemoryGB
using the Worker configuration file:
{
“cluster”: “/home/test_user/sqream_testing_temp/sqreamdb”,
“gpu”: 0,
“licensePath”: “home/test_user/SQream/tests/license.enc”,
“machineIP”: “127.0.0.1”,
“metadataServerIp”: 127.0.0.1,
“metadataServerPort”: 3105,
“port”: 5000,
“useConfigIP”: true,
“limitQueryMemoryGB" : 225,
}
Configuring the spoolMemoryGB
using the legacy configuration file:
{
"diskSpaceMinFreePercent": 10,
"enableLogDebug": false,
"insertCompressors": 8,
"insertParsers": 8,
"isUnavailableNode": false,
"logBlackList": "webui",
"logDebugLevel": 6,
"nodeInfoLoggingSec": 60,
"useClientLog": true,
"useMetadataServer": true,
"spoolMemoryGB": 175,
"waitForClientSeconds": 18000,
"enablePythonUdfs": true
}
Need help?
Visit SQreamDB Support for additional information.
Configuration Guides
The Configuration Guides page describes the following configuration information:
Configuring SQream
The Configuring SQream page describes the following configuration topics:
Configuration Levels
SQream’s configuration parameters are based on the following hierarchy:
Cluster-Based Configuration
Cluster-based configuration lets you centralize configurations for all workers on the cluster. Only Regular and Cluster flag types can be modified on the cluster level. These modifications are persistent and stored at the metadata level, which are applied globally to all workers in the cluster.
Note
While cluster-based configuration was designed for configuring Workers, you can only configure Worker values set to the Regular or Cluster type.
Worker-Based Configuration
Worker-based configuration lets you modify individual workers using a worker configuration file. Worker-based configuration modifications are persistent.
For more information on making configurations from the worker configuration file, see Configuring SQream Using the Previous Configuration Method.
Session-Based Configuration
Session-based configurations are not persistent and are deleted when your session ends. This method enables you to modify all required configurations while avoiding conflicts between flag attributes modified on different devices at different points in time. The SET flag_name command is used to modify flag values on the session level. Any modifications you make with the SET flag_name command apply only to your open session, and are not saved when it ends.
For example, when the query below has completed executing, the values configured will be restored to its previous setting:
set spoolMemoryGB=700;
select * from table a where date='2021-11-11'
Flag Types
SQream uses three flag types, Cluster, Worker, and Regular. Each of these flag types is associated with one of three hierarchical configuration levels described earlier, making it easier to configure your system.
The highest level in the hierarchy is Cluster, which lets you set configurations across all workers in a given cluster. Modifying cluster values is persistent, meaning that any configurations you set are retained after shutting down your system. Configurations set at the Cluster level take the highest priority and override settings made on the Regular and Worker level. This is known as cluster-based configuration. Note that Cluster-based configuration lets you modify Cluster and Regular flag types. An example of a Cluster flag is persisting your cache directory.
The second level is Worker, which lets you configure individual workers. Modifying Worker values are also persistent. This is known as worker-based configuration. Some examples of Worker flags includes setting total device memory usage and setting metadata server connection port.
The lowest level is Regular, which means that modifying values of Regular flags affects only your current session and are not persistent. This means that they are automatically restored to their default value when the session ends. This is known as session-based configuration. Some examples of Regular flags includes setting your bin size and setting CUDA memory.
To see each flag’s default value, see one of the following:
The Default Value column in the All Configurations section.
The flag’s individual description page, such as Setting CUDA Memory.
Configuration Roles
SQream divides flags into the following roles, each with their own set of permissions:
Administration Flags - can be modified by administrators on a session and cluster basis using the
ALTER SYSTEM SET
command:Regular
Worker
Cluster
Generic Flags - can be modified by standard users on a session basis:
Regular
Worker
Modification Methods
Modifying Your Configuration Using the Worker Configuration File
You can modify your configuration using the worker configuration file (config.json). Changes that you make to worker configuration files are persistent. Note that you can only set the attributes in your worker configuration file before initializing your SQream worker, and while your worker is active these attributes are read-only.
The following is an example of a worker configuration file:
{
"cluster": "/home/test_user/sqream_testing_temp/sqreamdb",
"gpu": 0,
"licensePath": "home/test_user/SQream/tests/license.enc",
"machineIP": "127.0.0.1",
"metadataServerIp": "127.0.0.1",
"metadataServerPort": 3105,
"port": 5000,
"useConfigIP": true,
"legacyConfigFilePath": "home/SQream_develop/SqrmRT/utils/json/legacy_congif.json"
}
You can access the legacy configuration file from the legacyConfigFilePath
parameter shown above. If all (or most) of your workers require the same flag settings, you can set the legacyConfigFilePath
attribute to the same legacy file.
Modifying Your Configuration Using a Legacy Configuration File
You can modify your configuration using a legacy configuration file.
The Legacy configuration file provides access to the read/write flags. A link to this file is provided in the legacyConfigFilePath parameter in the worker configuration file.
The following is an example of the legacy configuration file:
{
"developerMode": true,
"reextentUse": false,
"useClientLog": true,
"useMetadataServer": false,
"enablePythonUdfs": true
}
Parameter Values
Command |
Description |
Example |
---|---|---|
|
Used for modifying flag attributes. |
|
|
Used to preset either a specific flag value or all flag values. |
|
|
Used as a wildcard character for flag names. |
|
|
Used to print all flags with the following attributes:
|
|
|
Used to print all information output by the show_conf UF command, in addition to description, usage, data type, default value and range. |
|
|
Used to show a specific flag/all flags stored in the metadata. |
|
|
Used for storing or modifying flag attributes in the metadata. |
|
|
Used to remove a flag or all flag attributes from the metadata. |
|
Command Examples
This section includes the following command examples:
Running a Regular Flag Type Command
The following is an example of running a Regular flag type command:
SET spoolMemoryGB= 11;
executed
Running a Worker Flag Type Command
The following is an example of running a Worker flag type command:
SHOW spoolMemoryGB;
Running a Cluster Flag Type Command
The following is an example of running a Cluster flag type command:
ALTER SYSTEM RESET useMetadataServer;
executed
Showing All Flags in the Catalog Table
SQream uses the sqream_catalog.parameters catalog table for showing all flags, providing the scope (default, cluster and session), description, default value and actual value.
The following is the correct syntax for a catalog table query:
SELECT * FROM sqream_catalog.parameters
The following is an example of a catalog table query:
externalTableBlobEstimate, 100, 100, default,
varcharEncoding, ascii, ascii, default, Changes the expected encoding for Varchar columns
useCrcForTextJoinKeys, true, true, default,
hiveStyleImplicitStringCasts, false, false, default,
All Configurations
The following table describes all Generic and Administration configuration flags:
Flag Name |
Access Control |
Modification Type |
Description |
Data Type |
Default Value |
---|---|---|---|---|---|
|
Admin |
Regular |
Sets the custom bin size in the cache to enable high granularity bin control. |
string |
|
|
Generic |
Regular |
Sets how long the cache stores contents before being flushed. |
size_t |
|
|
Generic |
Regular |
Sets the ondisk directory location for the spool to save files on. |
size_t |
Any legal string |
|
Generic |
Regular |
Sets the amount of memory (GB) to be used by Spool on the disk. |
size_t |
|
|
Generic |
Regular |
Sets the number of partitions that the cache is split into. |
size_t |
|
|
Generic |
Regular |
Sets the persistent directory location for the spool to save files on. |
string |
Any legal string |
|
Generic |
Regular |
Sets the amount of data (GB) for the cache to store persistently. |
size_t |
|
|
Generic |
Regular |
Sets the amount of memory (GB) to be used by Spool InMemory. |
size_t |
|
|
Admin |
Regular |
Sets the pad device memory allocations with safety buffers to catch out-of-bounds writes. |
boolean |
|
|
Admin |
Regular |
Sets the runtime to pass only utility functions names to the compiler. |
boolean |
|
|
Admin |
Regular |
Sets the custom bin size in the cache to enable high granularity bin control. |
boolean |
|
|
Admin |
Regular |
Sets the hash table size of the CpuReduce. |
uint |
|
|
Admin |
Cluster |
Sets the maximum supported CSV row length. |
uint |
|
|
Admin |
Regular |
Sets the chunk size for copying from CPU to GPU. If set to 0, do not divide. |
uint |
|
|
Admin |
Regular |
Indicates if copying from/to GPU is synchronous. |
boolean |
|
|
Admin |
Worker |
Sets the percentage of total device memory to be used by the instance. |
uint |
|
|
Admin |
Regular |
Enables modifying R&D flags. |
boolean |
|
|
Admin |
Regular |
Activates the Nvidia profiler (nvprof) markers. |
boolean |
|
|
Admin |
Regular |
Enables creating and logging in the clientLogger_debug file. |
boolean |
|
|
Admin |
Regular |
Activates the Nvidia profiler (nvprof) markers. |
boolean |
|
|
Admin |
Regular |
Appends a string at the end of every log line. |
string |
|
|
Admin |
Cluster |
Sets the minimum size in mebibytes of extents for table bulk data. |
uint |
|
|
? |
Regular |
? |
? |
? |
|
Generic |
Regular |
Reorders join to force equijoins and/or equijoins sorted by table size. |
boolean |
|
|
Admin |
Regular |
Monitors all pinned allocations and all memcopies to/from device, and prints a report of pinned allocations that were not memcopied to/from the device using the dump_pinned_misses utility function. |
boolean |
|
|
Admin |
Worker |
Defines the threshold for creating a log recording a slow statement. |
size_t |
|
|
Admin |
Regular |
Increases the chunk size to reduce query speed. |
boolean |
|
|
Admin |
Regular |
Adds rechunker before expensive chunk producer. |
boolean |
|
|
Admin |
Worker |
Periodically examines the progress of running statements and logs statements exceeding the |
boolean |
|
|
Admin |
Regular |
Sets the buffer size. |
uint |
|
|
Generic |
Worker |
Prevents a query from processing more memory than the flag’s value. |
uint |
|
|
Admin |
Worker |
Sets the permitted log-in attempts. |
size_t |
|
|
Generic |
Regular |
Determines the client log level: 0 - L_SYSTEM, 1 - L_FATAL, 2 - L_ERROR, 3 - L_WARN, 4 - L_INFO, 5 - L_DEBUG, 6 - L_TRACE |
uint |
|
|
Admin |
Worker |
Manual setting of reported IP. |
string |
|
|
Generic |
Regular |
Sets the CPU to compress columns with size above (flag’s value) * (row count). |
uint |
|
|
Admin |
Regular |
Sets the maximum percentage CPU RAM that pinned memory can use. |
uint |
|
|
Admin |
Regular |
Sets the size of memory used during a query to trigger aborting the server. |
uint |
|
|
Admin |
Regular |
Sets the size of memory used during a query to trigger aborting the server. |
uint |
|
|
Admin |
Worker |
Sets the port used to connect to the metadata server. SQream recommends using port ranges above 1024† because ports below 1024 are usually reserved, although there are no strict limitations. Any positive number (1 - 65535) can be used. |
uint |
|
|
Admin |
Regular |
Splits large reads to multiple smaller ones and executes them concurrently. |
boolean |
|
|
Admin |
Regular |
Sets the number of workers to handle smaller concurrent reads. |
uint |
|
|
Admin |
Regular |
Sets the implicit cast in orc files, such as int to tinyint and vice versa. |
boolean |
|
|
Generic |
Regular |
Sets the name of the session tag. |
string |
Any legal string |
|
Generic |
Regular |
Sets the amount of memory (GB) to be used by the server for spooling. |
uint |
|
|
Admin |
Regular |
Sets the timeout (seconds) for acquiring object locks before executing statements. |
uint |
|
|
Admin |
Worker |
Activates the machineIP (true). Setting to false ignores the machineIP and automatically assigns a local network IP. This cannot be activated in a cloud scenario (on-premises only). |
boolean |
|
|
Admin |
Regular |
Interprets decimal literals as Double instead of Numeric. Used to preserve legacy behavior in existing customers. |
boolean |
|
|
Admin |
Regular |
Interprets ASCII-only strings as VARCHAR instead of TEXT. Used to preserve legacy behavior in existing customers. |
boolean |
|
|
Admin |
Regular |
Disables the creation of new tables, views, external tables containing Varchar columns, and the creation of user-defined functions with Varchar arguments or a Varchar return value. |
boolean |
|
Configuring LDAP authentication
Lightweight Directory Access Protocol (LDAP) is an authentication management service used with Microsoft Active Directory and other directory services. Once LDAP authentication has been configured for SQream, authorization for all existing and newly added roles must be handled by the LDAP server, except for the initial system deployment sqream
role, which was immediately given full control permissions when SQream was initially deployed.
Before integrating SQream with LDAP consider the following:
If SQream DB is being installed within an environment where LDAP is already configured, it is best practice to ensure that the newly created SQream role names are consistent with existing LDAP user names.
If SQream DB has been installed and LDAP has not yet been integrated with SQream, it is best practice to ensure that the newly created LDAP user names are consistent with existing SQream role names. Previously existing SQream roles that were mistakenly not configured in LDAP or that have names which are different than in LDAP, will be recreated in SQream as roles that cannot log in, have no permissions, and have no default schema.
Configuring SQream roles
Follow this procedure if you already have LDAP configured for your environment.
Create a new role:
CREATE ROLE <new_role>;
Grant the new role login permission:
GRANT LOGIN TO <new_role>;
Grant the new role
CONNECT
permission:
GRANT CONNECT ON DATABASE <my_database> TO <new_role>;
You may also wish to rename SQream roles so that they are consistent with existing LDAP user names.
Configuring LDAP Authentication
Configuration Methods
To configure LDAP authentication for SQream, you may choose one of the following configuration methods:
Method |
Description |
---|---|
Basic method |
A traditional approach to authentication in which the user provides a username and password combination to authenticate with the LDAP server. In this approach, all users are given access to SQream. |
Advanced method |
This approach allows for compartmentalization, which means that users can be grouped into categories, and each category can be assigned or denied access to SQream. This allows administrators to control access to SQream. |
Basic Method
Flag Attributes
To enable basic LDAP authentication, configure the following cluster flag attributes using the ALTER SYSTEM SET
command:
Attribute |
Description |
---|---|
|
Configure an authentication method: |
|
Configure the IP address or the Fully Qualified Domain Name (FQDN) of your LDAP server and select a protocol: |
|
Configure the LDAP connection timeout threshold (seconds). Default = 30 seconds |
|
LDAP server port number. |
|
Configure either basic or advanced authentication method. Default = |
|
String to prefix to the user name when forming the DN to bind as, when doing simple bind authentication |
|
String to append to the user name when forming the DN to bind as, when doing simple bind authentication |
Basic Method Configuration
Only roles with admin privileges or higher may enable LDAP Authentication.
Procedure
Set the
authenticationMethod
attribute:
ALTER SYSTEM SET authenticationMethod = 'ldap';
Set the
ldapIpAddress
attribute:
ALTER SYSTEM SET ldapIpAddress = '<ldaps://...>';
Set the
ldapPrefix
attribute:
ALTER SYSTEM SET ldapPrefix = '<DN_binding_string_prefix>=';
Set the
ldapSuffix
attribute:
ALTER SYSTEM SET ldapSuffix = '<DN_binding_string_suffix>';
To set the
ldapPort
attribute (Optional), run:
ALTER SYSTEM SET ldapPort = <port_number>
To set the
ldapConnTimeoutSec
attribute (Optional), run:
ALTER SYSTEM SET ldapConnTimeoutSec = <15>;
Restart all sqreamd servers.
Example
After completing the setup above, we can try to bind to a user by a distinguished name. For example, if the DN of the user is:
CN=ElonMusk,OU=Sqream Users,DC=sqream,DC=loc
We could set the ldapPrefix and ldapSuffix to
ALTER SYSTEM SET ldapPrefix = 'CN=';
ALTER SYSTEM SET ldapSuffix = ',OU=Sqream Users,DC=sqream,DC=loc';
Logging in will be possible using the username ElonMusk using sqream client
./sqream sql --username=ElonMusk --password=sqream123 --databasename=master --port=5000
Advanced Method
Flag Attributes
To enable advanced LDAP authentication, configure the following cluster flag attributes using the ALTER SYSTEM SET
command:
Attribute |
Description |
---|---|
|
Configure an authentication method: |
|
Configure the IP address or the Fully Qualified Domain Name (FQDN) of your LDAP server and select a protocol: |
|
Configure the LDAP connection timeout threshold (seconds). Default = 30 seconds |
|
LDAP server port number |
|
Set |
|
Root DN to begin the search for the user in, when doing advanced authentication |
|
DN of user with which to bind to the directory to perform the search when doing search + bind authentication |
|
Password for user with which to bind to the directory to perform the search when doing search + bind authentication |
|
Attribute to match against the user name in the search when doing search + bind authentication. If no attribute is specified, |
|
Filters |
Advanced Method Configuration
Only roles with admin privileges and higher may enable LDAP Authentication.
Procedure
Set the
authenticationMethod
attribute:
ALTER SYSTEM SET authenticationMethod = 'ldap';
Set the
ldapAdvancedMode
attribute:
ALTER SYSTEM SET ldapAdvancedMode = true;
Set the
ldapIpAddress
attribute:
ALTER SYSTEM SET ldapIpAddress = '<ldaps://<IpAddress>';
Set the
ldapBindDn
attribute:
ALTER SYSTEM SET ldapBindDn = <binding_user_DN>;
Set the
ldapBindDnPassword
attribute:
ALTER SYSTEM SET ldapBindDnPassword = '<binding_user_password>';
Set the
ldapBaseDn
attribute:
ALTER SYSTEM SET ldapBaseDn = '<search_root_DN>';
Set the
ldapSearchAttribute
attribute:
ALTER SYSTEM SET ldapSearchAttribute = '<search_attribute>';
To set the
ldapSearchFilter
attribute (Optional), run:
ALTER SYSTEM SET ldapSearchFilter = '(<attribute>=<value>)(<attribute2>=<value2>)(…)';
To set the
ldapPort
attribute (Optional), run:
ALTER SYSTEM SET ldapPort = <port_number>
To set the
ldapConnTimeoutSec
attribute (Optional), run:
ALTER SYSTEM SET ldapConnTimeoutSec = <15>;
Restart all sqreamd servers.
Example
After completing the setup above we can try to bind to a user by locating it by one of its unique attributes.
User DN =
CN=ElonMusk,OU=Sqream Users,DC=sqream,DC=loc
User has value of elonm for attribute sAMAccountName
.
ALTER SYSTEM SET authenticationMethod = 'ldap';
ALTER SYSTEM SET ldapAdvancedMode = true;
ALTER SYSTEM SET ldapIpAddress = 'ldaps://192.168.10.20';
ALTER SYSTEM SET ldapPort = 5000
ALTER SYSTEM SET ldapBindDn = 'CN=LDAP admin,OU=network admin,DC=sqream,DC=loc';
ALTER SYSTEM SET ldapBindDnPassword = 'sqream123';
ALTER SYSTEM SET ldapBaseDn = 'OU=Sqream Users,DC=sqream,DC=loc';
ALTER SYSTEM SET ldapSearchAttribute = 'sAMAccountName';
ALTER SYSTEM SET ldapConnTimeoutSec = 30;
ALTER SYSTEM SET ldapSearchFilter = "(memberOf=CN=SqreamGroup,CN=Builtin,DC=sqream,DC=loc)(memberOf=CN=Admins,CN=Builtin,DC=sqream,DC=loc)";
Logging in will be possible using the username elonm using sqream client
./sqream sql --username=elonm --password=<elonm_password> --databasename=master --port=5000
Disabling LDAP Authentication
To disable LDAP authentication and configure sqream authentication:
Execute the following syntax:
ALTER SYSTEM SET authenticationMethod = 'sqream';
Restart all sqreamd servers.
References
The Reference Guides section provides reference for using SQream DB’s interfaces and SQL features.
SQL Statements and Syntax
This section provides reference for using SQream DB’s SQL statements - DDL commands, DML commands and SQL query syntax.
SQL Syntax Features
SQream DB supports SQL from the ANSI 92 syntax and describes the following:
SQL Statements
The SQL Statements page describes the following commands:
SQream supports commands from ANSI SQL.
Data Definition Commands (DDL)
The following table shows the Data Definition commands:
Command |
Usage |
---|---|
Add a new column to a table |
|
Change the default schema for a role |
|
Change the schema of a table |
|
Change clustering keys in a table |
|
Create a new database |
|
Create a new foreign table in the database |
|
Create a new user defined function in the database |
|
Create a new schema in the database |
|
Create a new table in the database |
|
Create a new table in the database using results from a select query |
|
Create a new view in the database |
|
Drops all clustering keys in a table |
|
Drop a column from a table |
|
Drop a database and all of its objects |
|
Drop a function |
|
Drop a schema |
|
Drop a table and its contents from a database |
|
Drop a view |
|
Rename a column |
|
Rename a table |
|
Rename a schema |
Data Manipulation Commands (DML)
The following table shows the Data Manipulation commands:
Command |
Usage |
---|---|
Create a new table in the database using results from a select query |
|
Delete specific rows from a table |
|
Bulk load CSV data into an existing table |
|
Export a select query or entire table to CSV files |
|
Insert rows into a table |
|
Select rows and column from a table |
|
Delete all rows from a table |
|
Modify the value of certain columns in existing rows without creating a table |
|
Return rows containing literal values |
Utility Commands
The following table shows the Utility commands:
Command |
Usage |
---|---|
Drops a saved query |
|
Executes a previously saved query |
|
Returns a static query plan, which can be used to debug query plans |
|
Lists previously saved query names, one per row. |
|
Recompiles a saved query that has been invalidated due to a schema change |
|
View a user’s license information |
|
View the |
|
View the |
|
View the |
|
Recreate a view after schema changes |
|
View the |
|
Returns a list of active sessions on the current worker |
|
Returns a list of locks from across the cluster |
|
Returns a snapshot of the current query plan, similar to |
|
Returns a single row result containing the saved query string |
|
Returns a list of active sessions across the cluster |
|
Returns the system version for SQream DB |
|
Sets your server to finish compiling all active queries before shutting down according to a user-defined time value |
|
Stops or aborts an active statement |
Workload Management
The following table shows the Workload Management commands:
Command |
Usage |
---|---|
Add a SQream DB worker to a service queue |
|
Remove a SQream DB worker from a service queue |
|
Return a list of service queues and workers |
Access Control Commands
The following table shows the Access Control commands:
Command |
Usage |
---|---|
Applies a change to defaults in the current schema |
|
Applies a change to an existing role |
|
Creates a roles, which lets a database administrator control permissions on tables and databases |
|
Removes roles |
|
Returns all permissions granted to a role in table format |
|
Returns the definition of a global role in DDL format |
|
Returns the definition of all global roles in DDL format |
|
Returns the definition of a role’s database in DDL format |
|
Returns the definition of all role databases in DDL format |
|
Returns a list of permissions required to run a statement or query |
|
Grant permissions to a role |
|
Revoke permissions from a role |
|
Rename a role |
SQL Functions
SQream supports functions from ANSI SQL, as well as others for compatibility.
Summary of Functions
Built-In Scalar Functions
For more information about built-in scalar functions, see Built-In Scalar Functions.
Bitwise Operations
The following table shows the bitwise operations functions:
Function |
Description |
---|---|
Bitwise AND |
|
Bitwise NOT |
|
Bitwise OR |
|
Bitwise shift left |
|
Bitwise shift right |
|
Bitwise XOR |
Conditionals
The following table shows the conditionals functions:
Function |
Description |
---|---|
Value is in [ or not within ] the range |
|
Test a conditional expression, and depending on the result, evaluate additional expressions. |
|
Evaluate first non-NULL expression |
|
Value is in [ or not within ] a set of values |
|
Alias for COALESCE with two expressions |
|
Test a |
|
Check for |
Conversion
The following table shows the conversion functions:
Function |
Description |
---|---|
Converts a UNIX Timestamp to |
|
Converts a number to a hexadecimal string representation |
|
Converts a |
|
Returns the ASCII character representation of the supplied integer |
Date and Time
The following table shows the date and time functions:
Function |
Description |
---|---|
Special syntax, equivalent to CURRENT_DATE |
|
Returns the current date as |
|
Equivalent to GETDATE |
|
Extracts a date or time element from a date expression |
|
Adds an interval to a date expression |
|
Calculates the time difference between two date expressions |
|
Calculates the last day of the month of a given date expression |
|
ANSI syntax for extracting date or time element from a date expression |
|
Returns the current timestamp as |
|
Equivalent to GETDATE |
|
Truncates a date element down to a specified date or time element |
Numeric
The following table shows the arithmetic operators:
Operator |
Syntax |
Description |
---|---|---|
|
|
Converts a string to a numeric value. Identical to |
|
|
Adds two expressions together |
|
|
Negates a numeric expression |
|
|
Subtracts |
|
|
Multiplies |
|
|
Divides |
|
|
Modulu of |
For more information about arithmetic operators, see Arithmetic operators.
The following table shows the arithmetic operator functions:
Function |
Description |
---|---|
Calculates the absolute value of an argument |
|
Calculates the inverse cosine of an argument |
|
Calculates the inverse sine of an argument |
|
Calculates the inverse tangent of an argument |
|
Calculates the inverse tangent for a point (y, x) |
|
Calculates the next integer for an argument |
|
Calculates the cosine of an argument |
|
Calculates the cotangent of an argument |
|
Converts a value from radian values to degrees |
|
Calcalates the natural exponent for an argument (ex) |
|
Calculates the largest integer smaller than the argument |
|
Calculates the natural log for an argument |
|
Calculates the 10-based log for an argument |
|
Calculates the modulu (remainder) of two arguments |
|
Returns the constant value for π |
|
Calculates x to the power of y (xy) |
|
Converts a value from degree values to radians |
|
Rounds an argument down to the nearest integer, or an arbitrary precision |
|
Calculates the sine of an argument |
|
Calculates the square root of an argument (√x) |
|
Raises an argument to the power of 2 (xy) |
|
Calculates the tangent of an argument |
|
Rounds a number to its integer representation towards 0 |
Strings
The following table shows the string functions:
Function |
Description |
---|---|
Calculates number of characters in an argument |
|
Calculates the position where a string starts inside another string |
|
Concatenates two strings |
|
Calculates a CRC-64 hash of an argument |
|
Decodes or extracts binary data from a textual input string |
|
Matches if a string is the prefix of another string |
|
Returns the first number of characters from an argument |
|
Calculates the length of a string in characters |
|
Tests if a string argument matches a pattern |
|
Converts an argument to a lower-case equivalent |
|
Trims whitespaces from the left side of an argument |
|
Calculates the length of a string in bytes |
|
Calculates the position where a pattern matches a string |
|
Calculates the number of matches of a regular expression match in an argument |
|
Returns the start position of a regular expression match in an argument |
|
Replaces and returns the text column substrings of a regular expression match in an argument |
|
Returns a substring of an argument that matches a regular expression |
|
Repeats a string as many times as specified |
|
Replaces characters in a string |
|
Reverses a string argument |
|
Returns the last number of characters from an argument |
|
Tests if a string argument matches a regular expression pattern |
|
Trims whitespace from the right side of an argument |
|
Returns a substring of an argument |
|
Trims whitespaces from an argument |
|
Converts an argument to an upper-case equivalent |
|
Returns an |
User-Defined Scalar Functions
For more information about user-defined scalar functions, see Scalar SQL UDF.
Aggregate Functions
The following table shows the aggregate functions:
Function |
Aliases |
Description |
---|---|---|
Calculates the average of all of the values |
||
Calculates the Pearson correlation coefficient |
||
Calculates the count of all of the values or only distinct values |
||
Calculates population covariance of values |
||
Calculates sample covariance of values |
||
Returns maximum value of all values |
||
Returns minimum value of all values |
||
Calculates the sum of all of the values or only distinct values |
||
|
Calculates sample standard deviation of values |
|
|
Calculates population standard deviation of values |
|
|
Calculates sample variance of values |
|
|
Calculates population variance of values |
For more information about aggregate functions, see Aggregate Functions.
Window Functions
The following table shows the window functions:
Function |
Description |
---|---|
Calculates the value evaluated at the row that is before the current row within the partition |
|
Calculates the value evaluated at the row that is after the current row within the partition |
|
Calculates the maximum value |
|
Calculates the minimum value |
|
Calculates the sum of all of the values |
|
Calculates the rank of a row |
|
Returns the value in the first row of a window |
|
Returns the value in the last row of a window |
|
Returns the value in a specified |
|
Returns the rank of the current row with no gaps |
|
Returns the relative rank of the current row |
|
Returns the cumulative distribution of rows |
|
Returns an integer ranging between |
For more information about window functions, see Window Functions.
Workload Management Functions
The following table shows the workload management functions:
Function |
Description |
---|---|
Add a SQream DB worker to a service queue |
|
Remove a SQream DB worker to a service queue |
|
Return a list of service queues and workers |
Built-In Scalar Functions
The Built-In Scalar Functions page describes functions that return one value per call:
User-Defined Functions
The following user-defined functions are functions that can be defined and configured by users.
The User-Defined Functions page describes the following:
Aggregate Functions
Overview
Aggregate functions perform calculations based on a set of values and return a single value. Most aggregate functions ignore null values. Aggregate functions are often used with the GROUP BY
clause of the SELECT statement.
Available Aggregate Functions
The following list shows the available aggregate functions:
Window Functions
Window functions are functions applied over a subset (known as a window) of the rows returned by a SELECT query and describes the following:
For more information, see Window Functions in the SQL Syntax Features section.
Catalog Reference Guide
The Catalog Reference Guide describes the following:
Overview
The SQream database uses a schema called sqream_catalog
that contains information about your database’s objects, such as tables, columns, views, and permissions. Some additional catalog tables are used primarily for internal analysis and may differ across SQream versions.
What Information Does the Schema Contain?
The schema includes tables designated and relevant for both external and internal use:
External Tables
The following table shows the data objects contained in the sqream_catalog
schema designated for external use:
Database Object |
Table |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Internal Tables
The following table shows the data objects contained in the sqream_catalog
schema designated for internal use:
Database Object |
Table |
---|---|
Extents |
Shows |
Chunk columns |
Shows |
Chunks |
Shows |
Delete predicates |
Shows |
Catalog Tables
The sqream_catalog
includes the following tables:
Clustering Keys
The clustering_keys
data object is used for explicit clustering keys for tables. If you define more than one clustering key, each key is listed in a separate row, and is described in the following table:
Column |
Description |
---|---|
|
Shows the name of the database containing the table. |
|
Shows the ID of the table containing the column. |
|
Shows the name of the schema containing the table. |
|
Shows the name of the table containing the column. |
|
Shows the name of the column used as a clustering key for this table. |
Columns
The Columns database object shows the following tables:
Columns
The column
data object is used with standard tables and is described in the following table:
Column |
Description |
---|---|
|
Shows the name of the database containing the table. |
|
Shows the name of the schema containing the table. |
|
Shows the ID of the table containing the column. |
|
Shows the name of the table containing the column. |
|
Shows the ordinal number of the column in the table (begins at 0). |
|
Shows the column’s name. |
|
Shows the column’s data type. For more information see Supported Data Types. |
|
Shows the maximum length in bytes. |
|
Shows |
|
Shows the column’s default value. For more information, see Default Value Constraints. |
|
Shows the compression strategy that a user has overridden. |
|
Shows the timestamp displaying when the column was created. |
|
Shows the timestamp displaying when the column was last altered. |
External Table Columns
The external_table_columns
is used for viewing data from foreign tables.
For more information on foreign tables, see CREATE FOREIGN TABLE.
Databases
The databases
data object is used for displaying database information, and is described in the following table:
Column |
Description |
---|---|
|
Shows the database’s unique ID. |
|
Shows the database’s name. |
|
Reserved for internal use. |
|
Reserved for internal use. |
|
Reserved for internal use. |
|
Reserved for internal use. |
|
Reserved for internal use. |
Permissions
The permissions
data object is used for displaying permission information, such as roles (also known as grantees), and is described in the following tables:
Permission Types
The permission_types
object identifies the permission names existing in the database.
Column |
Description |
---|---|
|
Shows the permission type’s ID. |
|
Shows the name of the permission type. |
Default Permissions
The commands included in the Default Permissions section describe how to check the following default permissions:
Default Table Permissions
The sqream_catalog.table_default_permissions
command shows the columns described below:
Column |
Description |
---|---|
|
Shows the database that the default permission rule applies to. |
|
Shows the schema that the rule applies to, or |
|
Shows the role to apply the rule to. |
|
Shows the role that the permission is granted to. |
|
Shows the type of permission granted. |
Default Schema Permissions
The sqream_catalog.schema_default_permissions
command shows the columns described below:
Column |
Description |
---|---|
|
Shows the database that the default permission rule applies to. |
|
Shows the role to apply the rule to. |
|
Shows the role that the permission is granted to. |
|
Shows the type of permission granted. |
|
Shows the type of role that is granted permissions. |
For an example of using the sqream_catalog.table_default_permissions
command, see Granting Default Table Permissions.
Table Permissions
The table_permissions
data object identifies all permissions granted to tables. Each role-permission combination displays one row.
The following table describes the table_permissions
data object:
Column |
Description |
---|---|
|
Shows the name of the database containing the table. |
|
Shows the ID of the table the permission applies to. |
|
Shows the ID of the role granted permissions. |
|
Identifies the permission type. |
Database Permissions
The database_permissions
data object identifies all permissions granted to databases. Each role-permission combination displays one row.
The following table describes the database_permissions
data object:
Column |
Description |
---|---|
|
Shows the name of the database the permission applies to |
|
Shows the ID of the role granted permissions. |
|
Identifies the permission type. |
Schema Permissions
The schema_permissions
data object identifies all permissions granted to schemas. Each role-permission combination displays one row.
The following table describes the schema_permissions
data object:
Column |
Description |
---|---|
|
Shows the name of the database containing the schema. |
|
Shows the ID of the schema the permission applies to. |
|
Shows the ID of the role granted permissions. |
|
Identifies the permission type. |
Queries
The savedqueries
data object identifies the saved_queries in the database, as shown in the following table:
Column |
Description |
---|---|
|
Shows the saved query name. |
|
Shows the number of parameters to be replaced at run-time. |
For more information, see saved_queries.
Roles
The roles
data object is used for displaying role information, and is described in the following tables:
Roles
The roles
data object identifies the roles in the database, as shown in the following table:
Column |
Description |
---|---|
|
Shows the role’s database-unique ID. |
|
Shows the role’s name. |
|
Identifies whether the role is a superuser ( |
|
Identifies whether the role can be used to log in to SQream ( |
|
Identifies whether the role has a password ( |
Role Memberships
The roles_memberships
data object identifies the role memberships in the database, as shown below:
Column |
Description |
---|---|
|
Shows the role ID. |
|
Shows the ID of the parent role that this role inherits from. |
|
Identifies whether permissions are inherited ( |
|
Identifies whether role is admin ( |
Schemas
The schemas
data object identifies all the database’s schemas, as shown below:
Column |
Description |
---|---|
|
Shows the schema’s unique ID. |
|
Shows the schema’s name. |
|
Shows the name of the role that owns the schema. |
|
Reserved for internal use. |
Tables
The tables
data object is used for displaying table information, and is described in the following tables:
Tables
The tables
data object identifies proper (Comment - What does “proper” mean?) SQream tables in the database, as shown in the following table:
Column |
Description |
---|---|
|
Shows the name of the database containing the table. |
|
Shows the table’s database-unique ID. |
|
Shows the name of the schema containing the table. |
|
Shows the name of the table. |
|
Identifies whether the |
|
Shows the number of rows in the table. |
|
Relevant for internal use. |
Foreign Tables
The external_tables
data object identifies foreign tables in the database, as shown below:
Column |
Description |
---|---|
|
Shows the name of the database containing the table. |
|
Shows the table’s database-unique ID. |
|
Shows the name of the schema containing the table. |
|
Shows the name of the table. |
|
Identifies the foreign data wrapper used. |
|
Identifies the clause used to create the table. |
Views
The views
data object is used for displaying views in the database, as shown below:
Column |
Description |
---|---|
|
Shows the view’s database-unique ID. |
|
Shows the name of the schema containing the view. |
|
Shows the name of the view. |
|
Reserved for internal use. |
|
Identifies the |
User Defined Functions
The udf
data object is used for displaying UDFs in the database, as shown below:
Column |
Description |
---|---|
|
Shows the name of the database containing the view. |
|
Shows the UDF’s database-unique ID. |
|
Shows the name of the UDF. |
Additional Tables
The Reference Catalog includes additional tables that can be used for performance monitoring and inspection. The definition for these tables described on this page may change across SQream versions.
Extents
The extents
storage object identifies storage extents, and each storage extents can contain several chunks.
Note
This is an internal table designed for low-level performance troubleshooting.
Column |
Description |
---|---|
|
Shows the name of the databse containing the extent. |
|
Shows the ID of the table containing the extent. |
|
Shows the ID of the column containing the extent. |
|
Shows the ID for the extent. |
|
Shows the extent size in megabytes. |
|
Shows the full path to the extent on the file system. |
Chunk Columns
The chunk_columns
storage object lists chunk information by column.
Column |
Description |
---|---|
|
Shows the name of the databse containing the extent. |
|
Shows the ID of the table containing the extent. |
|
Shows the ID of the column containing the extent. |
|
Shows the chunk ID. |
|
Shows the extent ID. |
|
Shows the compressed chunk size in bytes. |
|
Shows the uncompressed chunk size in bytes. |
|
Shows the chunk’s actual compression scheme. |
|
Shows the minimum numeric value in the chunk (if one exists). |
|
Shows the maximum numeric value in the chunk (if one exists). |
|
Shows the minimum text value in the chunk (if one exists). |
|
Shows the maximum text value in the chunk (if one exists). |
|
Reserved for internal use. |
Note
This is an internal table designed for low-level performance troubleshooting.
Chunks
The chunks
storage object identifies storage chunks.
Column |
Description |
---|---|
|
Shows the name of the databse containing the chunk. |
|
Shows the ID of the table containing the chunk. |
|
Shows the ID of the column containing the chunk. |
|
Shows the amount of rows in the chunk. |
|
Determines what data to logically delete from the table first, and identifies how much data to delete from the chunk. The value |
Note
This is an internal table designed for low-level performance troubleshooting.
Delete Predicates
The delete_predicates
storage object identifies the existing delete predicates that have not been cleaned up.
Each DELETE command may result in several entries in this table.
Column |
Description |
---|---|
|
Shows the name of the databse containing the predicate. |
|
Shows the ID of the table containing the predicate. |
|
Reserved for internal use, this is a placeholder marker for the highest |
|
Identifies the DELETE predicate. |
Note
This is an internal table designed for low-level performance troubleshooting.
Examples
The Examples page includes the following examples:
Listing All Tables in a Database
master=> SELECT * FROM sqream_catalog.tables;
database_name | table_id | schema_name | table_name | row_count_valid | row_count | rechunker_ignore
--------------+----------+-------------+----------------+-----------------+-----------+-----------------
master | 1 | public | nba | true | 457 | 0
master | 12 | public | cool_dates | true | 5 | 0
master | 13 | public | cool_numbers | true | 9 | 0
master | 27 | public | jabberwocky | true | 8 | 0
Listing All Schemas in a Database
master=> SELECT * FROM sqream_catalog.schemas;
schema_id | schema_name | rechunker_ignore
----------+---------------+-----------------
0 | public | false
1 | secret_schema | false
Listing Columns and Their Types for a Specific Table
SELECT column_name, type_name
FROM sqream_catalog.columns
WHERE table_name='cool_animals';
Listing Delete Predicates
SELECT t.table_name, d.* FROM
sqream_catalog.delete_predicates AS d
INNER JOIN sqream_catalog.tables AS t
ON d.table_id=t.table_id;
Listing Saved Queries
SELECT * FROM sqream_catalog.savedqueries;
For more information, see Saved Queries.
Command line programs
SQream contains several command line programs for using, starting, managing, and configuring SQream DB clusters.
This topic contains the reference for these programs, as well as flags and configuration settings.
Command |
Usage |
---|---|
Built-in SQL client |
Command |
Usage |
---|---|
Start a SQream DB worker |
|
The cluster manager/coordinator that enables scaling SQream DB. |
|
Load balancer end-point |
Command |
Usage |
---|---|
Initialize a cluster and set superusers |
|
Upgrade metadata schemas when upgrading between major versions |
Command |
Usage |
---|---|
Dockerized convenience wrapper for operations |
|
Dockerized installer |
metadata_server
SQream DB’s cluster manager/coordinator is called metadata_server
.
In general, you should not need to run metadata_server
manually, but it is sometimes useful for testing.
This page serves as a reference for the options and parameters.
Positional command line arguments
$ metadata_server [ <logging path> [ <listen port> ] ]
Argument |
Default |
Description |
---|---|---|
Logging path |
Current directory |
Path to store metadata logs into |
Listen port |
|
TCP listen port. If used, log path must be specified beforehand. |
Starting metadata server
Starting temporarily
$ nohup metadata_server &
$ MS_PID=$!
Using nohup
and &
sends metadata server to run in the background.
Note
Logs are saved to the current directory, under
metadata_server_logs
.The default listening port is 3105
Starting temporarily with non-default port
To use a non-default port, specify the logging path as well.
$ nohup metadata_server /home/rhendricks/metadata_logs 9241 &
$ MS_PID=$!
Using nohup
and &
sends metadata server to run in the background.
Note
Logs are saved to the
/home/rhendricks/metadata_logs
directory.The listening port is 9241
Stopping metadata server
To stop metadata server:
$ kill -9 $MS_PID
Tip
It is safe to stop any SQream DB component at any time using kill
. No partial data or data corruption should occur when using this method to stop the process.
sqreamd
SQream DB’s main worker is called sqreamd
.
This page serves as a reference for the options and parameters.
Starting SQream DB
Start SQream DB temporarily
In general, you should not need to run sqreamd
manually, but it is sometimes useful for testing.
$ nohup sqreamd -config ~/.sqream/sqream_config.json &
$ SQREAM_PID=$!
Using nohup
and &
sends SQream DB to run in the background.
To stop the active worker:
$ kill -9 $SQREAM_PID
Tip
It is safe to stop SQream DB at any time using kill
. No partial data or data corruption should occur when using this method to stop the process.
Command line arguments
sqreamd
supports the following command line arguments:
Argument |
Default |
Description |
---|---|---|
|
None |
Outputs the version of SQream DB and immediately exits. |
|
|
Specifies the configuration file to use |
|
Don’t use SSL |
When specified, tells SQream DB to listen for SSL connections |
Positional command arguments
sqreamd
also supports positional arguments, when not using a configuration file.
This method can be used to temporarily start a SQream DB worker for testing.
$ sqreamd <Storage path> <GPU ordinal> <TCP listen port (unsecured)> <License path>
Argument |
Required |
Description |
---|---|---|
Storage path |
✓ |
Full path to a valid SQream DB persistant storage |
GPU Ordinal |
✓ |
Number representing the GPU to use. Check GPU ordinals with nvidia-smi -L |
TCP listen port (unsecured) |
✓ |
TCP port SQream DB should listen on. Recommended: 5000 |
License path |
✓ |
Full path to a SQream DB license file |
sqream-console
sqream-console
is an interactive shell designed to help manage a dockerized SQream DB installation.
The console itself is a dockerized application.
This page serves as a reference for the options and parameters.
Starting the console
sqream-console
can be found in your SQream DB installation, under the name sqream-console
.
Start the console by executing it from the shell
$ ./sqream-console
....................................................................................................................
███████╗ ██████╗ ██████╗ ███████╗ █████╗ ███╗ ███╗ ██████╗ ██████╗ ███╗ ██╗███████╗ ██████╗ ██╗ ███████╗
██╔════╝██╔═══██╗██╔══██╗██╔════╝██╔══██╗████╗ ████║ ██╔════╝██╔═══██╗████╗ ██║██╔════╝██╔═══██╗██║ ██╔════╝
███████╗██║ ██║██████╔╝█████╗ ███████║██╔████╔██║ ██║ ██║ ██║██╔██╗ ██║███████╗██║ ██║██║ █████╗
╚════██║██║▄▄ ██║██╔══██╗██╔══╝ ██╔══██║██║╚██╔╝██║ ██║ ██║ ██║██║╚██╗██║╚════██║██║ ██║██║ ██╔══╝
███████║╚██████╔╝██║ ██║███████╗██║ ██║██║ ╚═╝ ██║ ╚██████╗╚██████╔╝██║ ╚████║███████║╚██████╔╝███████╗███████╗
╚══════╝ ╚══▀▀═╝ ╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═╝ ╚═══╝╚══════╝ ╚═════╝ ╚══════╝╚══════╝
....................................................................................................................
Welcome to SQream Console ver 1.7.6, type exit to log-out
usage: sqream [-h] [--settings] {master,worker,client,editor} ...
Run SQream Cluster
optional arguments:
-h, --help show this help message and exit
--settings sqream environment variables settings
subcommands:
sqream services
{master,worker,client,editor}
sub-command help
master start sqream master
worker start sqream worker
client operating sqream client
editor operating sqream statement editor
sqream-console>
The console is now waiting for commands.
The console is a wrapper around a standard linux shell. It supports commands like ls
, cp
, etc.
All SQream DB-specific commands start with the keyword sqream
.
Operations and flag reference
Commands
Command |
Description |
---|---|
|
Shows the initial usage information |
|
Controls the master node’s operations |
|
Controls workers’ operations |
|
Access to sqream sql |
|
Controls the statement editor’s operations (web UI) |
Master
The master node contains the metadata server and the load balancer.
Syntax
sqream master <flags>
Flag/command |
Description |
---|---|
|
Starts the master node.
The |
|
Stops the master node and all connected workers.
The |
|
Shows a list of all active master nodes and their workers |
|
Sets the port for the load balancer. Defaults to |
|
Sets the port for the metadata server. Defaults to |
Common usage
Start master node
sqream-console> sqream master --start
starting master server in single_host mode ...
sqream_single_host_master is up and listening on ports: 3105,3108
Start master node on different ports
sqream-console> sqream master --start -p 4105 -m 4108
starting master server in single_host mode ...
sqream_single_host_master is up and listening on ports: 4105,4108
Listing active master nodes and workers
sqream-console> sqream master --list
container name: sqream_single_host_worker_1, container id: de9b8aff0a9c
container name: sqream_single_host_worker_0, container id: c919e8fb78c8
container name: sqream_single_host_master, container id: ea7eef80e038
Stopping all SQream DB workers and master
sqream-console> sqream master --stop --all
shutting down 2 sqream services ...
sqream_editor stopped
sqream_single_host_worker_1 stopped
sqream_single_host_worker_0 stopped
sqream_single_host_master stopped
Workers
Workers are SQream DB daemons, that connect to the master node.
Syntax
sqream worker <flags>
Flag/command |
Description |
---|---|
|
Starts worker nodes. See options table below. |
|
Stops the specified worker name.
The |
Start options are specified consecutively, separated by spaces.
Option |
Description |
---|---|
|
Specifies the number of workers to start |
|
Specifies configuration files to apply to each worker. When launching multiple workers, specify one file per worker, separated by spaces. |
|
Sets the ports to listen on. When launching multiple workers, specify one port per worker, separated by spaces. Defaults to 5000 - 5000+n. |
|
Sets the GPU ordinal to assign to each worker. When launching multiple workers, specify one GPU ordinal per worker, separated by spaces. Defaults to automatic allocation. |
|
Sets the spool memory per node in gigabytes. |
|
Sets the hostname for the master node. Defaults to |
|
Sets the port for the master node. Defaults to |
|
For testing only: Starts a worker without connecting to the master node. |
Common usage
Start 2 workers
After starting the master node, start workers:
sqream-console> sqream worker --start 2
started sqream_single_host_worker_0 on port 5000, allocated gpu: 0
started sqream_single_host_worker_1 on port 5001, allocated gpu: 1
Stop a single worker
To stop a single worker, find its name first:
sqream-console> sqream master --list
container name: sqream_single_host_worker_1, container id: de9b8aff0a9c
container name: sqream_single_host_worker_0, container id: c919e8fb78c8
container name: sqream_single_host_master, container id: ea7eef80e038
Then, issue a stop command:
sqream-console> sqream worker --stop sqream_single_host_worker_1
stopped sqream_single_host_worker_1
Start workers with a different spool size
If no spool size is specified, the RAM is equally distributed among workers. Sometimes a system engineer may wish to specify the spool size manually.
This example starts two workers, with a spool size of 50GB per node:
sqream-console> sqream worker --start 2 -m 50
Starting multiple workers on non-dedicated GPUs
By default, SQream DB workers assign one worker per GPU. However, a system engineer may wish to assign multiple workers per GPU, if the workload permits it.
This example starts 4 workers on 2 GPUs, with 50GB spool each:
sqream-console> sqream worker --start 2 -g 0 -m 50
started sqream_single_host_worker_0 on port 5000, allocated gpu: 0
started sqream_single_host_worker_1 on port 5001, allocated gpu: 0
sqream-console> sqream worker --start 2 -g 1 -m 50
started sqream_single_host_worker_2 on port 5002, allocated gpu: 1
started sqream_single_host_worker_3 on port 5003, allocated gpu: 1
Overriding default configuration files
It is possible to override default configuration settings by listing a configuration file for every worker.
This example starts 2 workers on the same GPU, with modified configuration files:
sqream-console> sqream worker --start 2 -g 0 -j /etc/sqream/configfile.json /etc/sqream/configfile2.json
Client
The client operation runs sqream sql in interactive mode.
Note
The dockerized client is useful for testing and experimentation. It is not the recommended method for executing analytic queries. See more about connecting a third party tool to SQream DB for data analysis.
Syntax
sqream client <flags>
Flag/command |
Description |
---|---|
|
Connects to the master node via the load balancer |
|
Connects to a worker directly |
|
Specifies the hostname to connect to. Defaults to |
|
Specifies the port to connect to. Defaults to |
|
Specifies the role’s username to use |
|
Specifies the password to use for the role |
|
Specifies the database name for the connection. Defaults to |
Common usage
Start a client
Connect to default master
database through the load balancer:
sqream-console> sqream client --master -u sqream -w sqream
Interactive client mode
To quit, use ^D or \q.
master=> _
Start a client to a specific worker
Connect to database raviga
directly to a worker on port 5000:
sqream-console> sqream client --worker -u sqream -w sqream -p 5000 -d raviga
Interactive client mode
To quit, use ^D or \q.
raviga=> _
Start master node on different ports
sqream-console> sqream master --start -p 4105 -m 4108
starting master server in single_host mode ...
sqream_single_host_master is up and listening on ports: 4105,4108
Listing active master nodes and worker nodes
sqream-console> sqream master --list
container name: sqream_single_host_worker_1, container id: de9b8aff0a9c
container name: sqream_single_host_worker_0, container id: c919e8fb78c8
container name: sqream_single_host_master, container id: ea7eef80e038
Editor
The editor operation runs the web UI for the SQream DB Statement Editor.
The editor can be used to run queries from a browser.
Syntax
sqream editor <flags>
Flag/command |
Description |
---|---|
|
Start the statement editor |
|
Shut down the statement editor |
|
Specify a different port for the editor. Defaults to |
Common usage
Start the editor UI
sqream-console> sqream editor --start
access sqream statement editor through Chrome http://192.168.0.100:3000
Stop the editor UI
sqream-console> sqream editor --stop
sqream_editor stopped
Using the console to start SQream DB
The console is used to start and stop SQream DB components in a dockerized environment.
Starting a SQream DB cluster for the first time
To start a SQream DB cluster, start the master node, followed by workers.
The example below starts 2 workers, running on 2 dedicated GPUs.
sqream-console> sqream master --start
starting master server in single_host mode ...
sqream_single_host_master is up and listening on ports: 3105,3108
sqream-console> sqream worker --start 2
started sqream_single_host_worker_0 on port 5000, allocated gpu: 0
started sqream_single_host_worker_1 on port 5001, allocated gpu: 1
sqream-console> sqream editor --start
access sqream statement editor through Chrome http://192.168.0.100:3000
SQream DB is now listening on port 3108 for any incoming statements.
A user can also access the web editor (running on port 3000
on the SQream DB machine) to connect and run queries.
sqream-installer
sqream-installer
is an application that prepares and configures a dockerized SQream DB installation.
This page serves as a reference for the options and parameters.
Operations and flag reference
Command line flags
Flag |
Description |
---|---|
|
Loads the docker images for installation |
|
Load new licenses from the |
|
Validate licenses |
|
Force overwrite any existing installation and data directories currently in use |
|
Specifies a path to read and store configuration files in. Defaults to |
|
Specifies a path to the storage cluster. The path is created if it does not exist. |
|
Specifies a path to store system startup logs. Defaults to |
|
Specifies a path to expose to SQream DB workers. To expose several paths, repeat the usage of this flag. |
|
Shows system settings |
|
Reset the system configuration. This flag can’t be combined with other flags. |
Usage
Install SQream DB for the first time
Assuming license package tarball has been placed in the license
subfolder.
The path where SQream DB will store data is
/home/rhendricks/sqream_storage
.Logs will be stored in /var/log/sqream
Source CSV, Parquet, and ORC files can be accessed from
/home/rhendricks/source_data
. All other directory paths are hidden from the Docker container.
# ./sqream-install -i -k -v /home/rhendricks/sqream_storage -l /var/log/sqream -c /etc/sqream -d /home/rhendricks/source_data
Note
Installation commands should be run with sudo
or root access.
Modify exposed directories
To expose more directory paths for SQream DB to read and write data from, re-run the installer with additional directory flags.
# ./sqream-install -d /home/rhendricks/more_source_data
There is no need to specify the initial installation flags - only the modified exposed directory paths flag.
Install a new license package
Assuming license package tarball has been placed in the license
subfolder.
# ./sqream-install -k
View system settings
This information may be useful to identify problems accessing directory paths, or locating where data is stored.
# ./sqream-install -s
SQREAM_CONSOLE_TAG=1.7.4
SQREAM_TAG=2020.1
SQREAM_EDITOR_TAG=3.1.0
license_worker_0=[...]
license_worker_1=[...]
license_worker_2=[...]
license_worker_3=[...]
SQREAM_VOLUME=/home/rhendricks/sqream_storage
SQREAM_DATA_INGEST=/home/rhendricks/source_data
SQREAM_CONFIG_DIR=/etc/sqream/
LICENSE_VALID=true
SQREAM_LOG_DIR=/var/log/sqream/
SQREAM_USER=sqream
SQREAM_HOME=/home/sqream
SQREAM_ENV_PATH=/home/sqream/.sqream/env_file
PROCESSOR=x86_64
METADATA_PORT=3105
PICKER_PORT=3108
NUM_OF_GPUS=8
CUDA_VERSION=10.1
NVIDIA_SMI_PATH=/usr/bin/nvidia-smi
DOCKER_PATH=/usr/bin/docker
NVIDIA_DRIVER=418
SQREAM_MODE=single_host
Upgrading to a new version of SQream DB
When upgrading to a new version with Docker, most settings don’t need to be modified.
The upgrade process replaces the existing docker images with new ones.
Obtain the new tarball, and untar it to an accessible location. Enter the newly extracted directory.
Install the new images
# ./sqream-install -i
The upgrade process will check for running SQream DB processes. If any are found running, the installer will ask to stop them in order to continue the upgrade process. Once all services are stopped, the new version will be loaded.
After the upgrade, open sqream-console and restart the desired services.
Server Picker
SQream DB’s load balancer is called server_picker
.
This page serves as a reference for the options and parameters.
Positional command line arguments
$ server_picker [ <Metadata server address> <Metadata server port> [ <TCP listen port> [ <SSL listen port> ] ]
Argument |
Default |
Description |
---|---|---|
|
IP or hostname to an active metadata server |
|
|
TCP port to an active metadata server |
|
|
|
TCP port for server picker to listen on |
|
|
SSL port for server picker to listen on |
Starting server picker
Starting temporarily
In general, you should not need to run server_picker
manually, but it is sometimes useful for testing.
Assuming we have a metadata server listening on the localhost, on port 3105:
$ nohup server_picker 127.0.0.1 3105 &
$ SP_PID=$!
Using nohup
and &
sends server picker to run in the background.
Starting temporarily with non-default port
Tell server picker to listen on port 2255 for unsecured connections, and port 2266 for SSL connections.
$ nohup server_picker 127.0.0.1 3105 2255 2266 &
$ SP_PID=$!
Using nohup
and &
sends server picker to run in the background.
Stopping server picker
$ kill -9 $SP_PID
Tip
It is safe to stop any SQream DB component at any time using kill
. No partial data or data corruption should occur when using this method to stop the process.
SqreamStorage
You can use the SqreamStorage program to create a new storage cluster.
The SqreamStorage page serves as a reference for the options and parameters.
Running SqreamStorage
The SqreamStorage program is located in the bin directory of your SQream installation..
Command Line Arguments
The SqreamStorage program supports the following command line arguments:
Argument |
Shorthand |
Description |
---|---|---|
|
|
Creates a storage cluster at a specified path |
|
|
Specifies the cluster path. The path must not already exist. |
Example
The Examples section describes how to create a new storage cluster at /home/rhendricks/raviga_database
:
$ SqreamStorage --create-cluster --cluster-root /home/rhendricks/raviga_database
Setting cluster version to: 26
Alternatively, you can write this in shorthand as SqreamStorage -C -r /home/rhendricks/raviga_database
. A message is displayed confirming that your cluster has been created.
Sqream SQL CLI Reference
SQream DB comes with a built-in client for executing SQL statements either interactively or from the command-line.
This page serves as a reference for the options and parameters. Learn more about using SQream DB SQL with the CLI by visiting the first_steps tutorial.
Installing Sqream SQL
If you have a SQream DB installation on your server, sqream sql
can be found in the bin
directory of your SQream DB installation, under the name sqream
.
Note
If you installed SQream DB via Docker, the command is named sqream-client sql
, and can be found in the same location as the console.
Changed in version 2020.1: As of version 2020.1, ClientCmd
has been renamed to sqream sql
.
To run sqream sql
on any other Linux host:
Download the
sqream sql
tarball package from the Client Drivers page.Untar the package:
tar xf sqream-sql-v2020.1.1_stable.x86_64.tar.gz
Start the client:
$ cd sqream-sql-v2020.1.1_stable.x86_64 $ ./sqream sql --port=5000 --username=jdoe --databasename=master Password: Interactive client mode To quit, use ^D or \q. master=> _
Troubleshooting Sqream SQL Installation
Upon running sqream sql for the first time, you may get an error error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory
.
Solving this error requires installing the ncruses or libtinfo libraries, depending on your operating system.
Ubuntu:
Install
libtinfo
:$ sudo apt-get install -y libtinfo
Depending on your Ubuntu version, you may need to create a symbolic link to the newer libtinfo that was installed.
For example, if
libtinfo
was installed as/lib/x86_64-linux-gnu/libtinfo.so.6.2
:$ sudo ln -s /lib/x86_64-linux-gnu/libtinfo.so.6.2 /lib/x86_64-linux-gnu/libtinfo.so.5
CentOS / RHEL:
Install
ncurses
:$ sudo yum install -y ncurses-libs
Depending on your RHEL version, you may need to create a symbolic link to the newer libtinfo that was installed.
For example, if
libtinfo
was installed as/usr/lib64/libtinfo.so.6
:$ sudo ln -s /usr/lib64/libtinfo.so.6 /usr/lib64/libtinfo.so.5
Using Sqream SQL
By default, sqream sql runs in interactive mode. You can issue commands or SQL statements.
Running Commands Interactively (SQL shell)
When starting sqream sql, after entering your password, you are presented with the SQL shell.
To exit the shell, type \q
or Ctrl-d.
$ sqream sql --port=5000 --username=jdoe --databasename=master
Password:
Interactive client mode
To quit, use ^D or \q.
master=> _
The database name shown means you are now ready to run statements and queries.
Statements and queries are standard SQL, followed by a semicolon (;
). Statement results are usually formatted as a valid CSV,
followed by the number of rows and the elapsed time for that statement.
master=> SELECT TOP 5 * FROM nba;
Avery Bradley ,Boston Celtics ,0,PG,25,6-2 ,180,Texas ,7730337
Jae Crowder ,Boston Celtics ,99,SF,25,6-6 ,235,Marquette ,6796117
John Holland ,Boston Celtics ,30,SG,27,6-5 ,205,Boston University ,\N
R.J. Hunter ,Boston Celtics ,28,SG,22,6-5 ,185,Georgia State ,1148640
Jonas Jerebko ,Boston Celtics ,8,PF,29,6-10,231,\N,5000000
5 rows
time: 0.001185s
Note
Null values are represented as \N.
When writing long statements and queries, it may be beneficial to use line-breaks.
The prompt for a multi-line statement will change from =>
to .
, to alert users to the change. The statement will not execute until a semicolon is used.
$ sqream sql --port=5000 --username=mjordan -d master
Password:
Interactive client mode
To quit, use ^D or \q.
master=> SELECT "Age",
. AVG("Salary")
. FROM NBA
. GROUP BY 1
. ORDER BY 2 ASC
. LIMIT 5
. ;
38,1840041
19,1930440
23,2034746
21,2067379
36,2238119
5 rows
time: 0.009320s
Executing Batch Scripts (-f
)
To run an SQL script, use the -f <filename>
argument.
For example,
$ sqream sql --port=5000 --username=jdoe -d master -f sql_script.sql --results-only
Tip
Output can be saved to a file by using redirection (>
).
Executing Commands Immediately (-c
)
To run a statement from the console, use the -c <statement>
argument.
For example,
$ sqream sql --port=5000 --username=jdoe -d nba -c "SELECT TOP 5 * FROM nba"
Avery Bradley ,Boston Celtics ,0,PG,25,6-2 ,180,Texas ,7730337
Jae Crowder ,Boston Celtics ,99,SF,25,6-6 ,235,Marquette ,6796117
John Holland ,Boston Celtics ,30,SG,27,6-5 ,205,Boston University ,\N
R.J. Hunter ,Boston Celtics ,28,SG,22,6-5 ,185,Georgia State ,1148640
Jonas Jerebko ,Boston Celtics ,8,PF,29,6-10,231,\N,5000000
5 rows
time: 0.202618s
Tip
Remove the timing and row count by passing the --results-only
parameter
Examples
Starting a Regular Interactive Shell
Connect to local server 127.0.0.1 on port 5000, to the default built-in database, master:
$ sqream sql --port=5000 --username=mjordan -d master
Password:
Interactive client mode
To quit, use ^D or \q.
master=>_
Connect to local server 127.0.0.1 via the built-in load balancer on port 3108, to the default built-in database, master:
$ sqream sql --port=3105 --clustered --username=mjordan -d master
Password:
Interactive client mode
To quit, use ^D or \q.
master=>_
Executing Statements in an Interactive Shell
Note that all SQL commands end with a semicolon.
Creating a new database and switching over to it without reconnecting:
$ sqream sql --port=3105 --clustered --username=oldmcd -d master
Password:
Interactive client mode
To quit, use ^D or \q.
master=> create database farm;
executed
time: 0.003811s
master=> \c farm
farm=>
farm=> create table animals(id int not null, name text(30) not null, is_angry bool not null);
executed
time: 0.011940s
farm=> insert into animals values(1,'goat',false);
executed
time: 0.000405s
farm=> insert into animals values(4,'bull',true) ;
executed
time: 0.049338s
farm=> select * from animals;
1,goat ,0
4,bull ,1
2 rows
time: 0.029299s
Executing SQL Statements from the Command Line
$ sqream sql --port=3105 --clustered --username=oldmcd -d farm -c "SELECT * FROM animals WHERE is_angry = true"
4,bull ,1
1 row
time: 0.095941s
Controlling the Client Output
Two parameters control the dispay of results from the client:
--results-only
- removes row counts and timing information--delimiter
- changes the record delimiter
Exporting SQL Query Results to CSV
Using the --results-only
flag removes the row counts and timing.
$ sqream sql --port=3105 --clustered --username=oldmcd -d farm -c "SELECT * FROM animals" --results-only > file.csv
$ cat file.csv
1,goat ,0
2,sow ,0
3,chicken ,0
4,bull ,1
Changing a CSV to a TSV
The --delimiter
parameter accepts any printable character.
Tip
To insert a tab, use Ctrl-V followed by Tab ↹ in Bash.
$ sqream sql --port=3105 --clustered --username=oldmcd -d farm -c "SELECT * FROM animals" --delimiter ' ' > file.tsv
$ cat file.tsv
1 goat 0
2 sow 0
3 chicken 0
4 bull 1
Executing a Series of Statements From a File
Assuming a file containing SQL statements (separated by semicolons):
$ cat some_queries.sql
CREATE TABLE calm_farm_animals
( id INT IDENTITY(0, 1), name TEXT(30)
);
INSERT INTO calm_farm_animals (name)
SELECT name FROM animals WHERE is_angry = false;
$ sqream sql --port=3105 --clustered --username=oldmcd -d farm -f some_queries.sql
executed
time: 0.018289s
executed
time: 0.090697s
Connecting Using Environment Variables
You can save connection parameters as environment variables:
$ export SQREAM_USER=sqream;
$ export SQREAM_DATABASE=farm;
$ sqream sql --port=3105 --clustered --username=$SQREAM_USER -d $SQREAM_DATABASE
Connecting to a Specific Queue
When using the dynamic workload manager - connect to etl
queue instead of using the default sqream
queue.
$ sqream sql --port=3105 --clustered --username=mjordan -d master --service=etl
Password:
Interactive client mode
To quit, use ^D or \q.
master=>_
Operations and Flag References
Command Line Arguments
Sqream SQL supports the following command line arguments:
Argument |
Default |
Description |
---|---|---|
|
None |
Changes the mode of operation to single-command, non-interactive. Use this argument to run a statement and immediately exit. |
|
None |
Changes the mode of operation to multi-command, non-interactive. Use this argument to run a sequence of statements from an external file and immediately exit. |
|
|
Address of the SQream DB worker. |
|
|
Sets the connection port. |
|
None |
Specifies the database name for queries and statements in this session. |
|
None |
Username to connect to the specified database. |
|
None |
Specify the password using the command line argument. If not specified, the client will prompt the user for the password. |
|
False |
When used, the client connects to the load balancer, usually on port |
|
|
Service name (queue) that statements will file into. |
|
False |
Outputs results only, without timing information and row counts |
|
False |
When set, prevents command history from being saved in |
|
|
Specifies the field separator. By default, |
Tip
Run $ sqream sql --help
to see a full list of arguments
Supported Record Delimiters
The supported record delimiters are printable ASCII values (32-126).
Recommended delimiters for use are:
,
,|
, tab character.The following characters are not supported:
\
,N
,-
,:
,"
,\n
,\r
,.
, lower-case latin letters, digits (0-9)
Meta-Commands
Meta-commands in Sqream SQL start with a backslash (
\
)
Note
Meta commands do not end with a semicolon
Command |
Example |
Description |
---|---|---|
|
master=> \q
|
Quit the client. (Same as Ctrl-d) |
|
master=> \c fox
fox=>
|
Changes the current connection to an alternate database |
Basic Commands
Moving Around the Command Line
Command |
Description |
---|---|
Ctrl-a |
Goes to the beginning of the command line. |
Ctrl-e |
Goes to the end of the command line. |
Ctrl-u |
Deletes from cursor to the beginning of the command line. |
Ctrl-k |
Deletes from the cursor to the end of the command line. |
Ctrl-w |
Delete from cursor to beginning of a word. |
Ctrl-y |
Pastes a word or text that was cut using one of the deletion shortcuts (such as the one above) after the cursor. |
Alt-b |
Moves back one word (or goes to the beginning of the word where the cursor is). |
Alt-f |
Moves forward one word (or goes to the end of word the cursor is). |
Alt-d |
Deletes to the end of a word starting at the cursor. Deletes the whole word if the cursor is at the beginning of that word. |
Alt-c |
Capitalizes letters in a word starting at the cursor. Capitalizes the whole word if the cursor is at the beginning of that word. |
Alt-u |
Capitalizes from the cursor to the end of the word. |
Alt-l |
Makes lowercase from the cursor to the end of the word. |
Ctrl-f |
Moves forward one character. |
Ctrl-b |
Moves backward one character. |
Ctrl-h |
Deletes characters located before the cursor. |
Ctrl-t |
Swaps a character at the cursor with the previous character. |
Searching
Command |
Description |
---|---|
Ctrl-r |
Searches the history backward. |
Ctrl-g |
Escapes from history-searching mode. |
Ctrl-p |
Searches the previous command in history. |
Ctrl-n |
Searches the next command in history. |
upgrade_storage
upgrade_storage
is used to upgrade metadata schemas, when upgrading between major versions.
This page serves as a reference for the options and parameters.
Running upgrade_storage
upgrade_storage
can be found in the bin
directory of your SQream DB installation.
Command line arguments and options
Parameter |
Parameter Type |
Description |
---|---|---|
|
Argument |
Full path to a valid storage cluster. |
|
Option |
Displays your current storage version. |
|
Option |
Allows the upgrade process to proceed even if there are predicates marked for deletion. |
Syntax
$ upgrade_storage <storage path> [--check_predicates=0]
$ upgrade_storage <storage path> [--storage_version]
Results and error codes
Result |
Message |
Description |
---|---|---|
Success |
|
Storage has been successfully upgraded |
Success |
|
Storage doesn’t need an upgrade |
Failure: can’t read storage |
|
Check permissions, and ensure no SQream DB workers or metadata_server are running when performing this operation. |
Examples
Upgrade SQream DB’s storage cluster
$ ./upgrade_storage /home/rhendricks/raviga_database
get_rocksdb_version path{/home/rhendricks/raviga_database}
current storage version 23
upgrade_v24
upgrade_storage to 24
upgrade_storage to 24 - Done
upgrade_v25
upgrade_storage to 25
upgrade_storage to 25 - Done
upgrade_v26
upgrade_storage to 26
upgrade_storage to 26 - Done
validate_rocksdb
storage has been upgraded successfully to version 26
This message confirms that the cluster has already been upgraded correctly.
SQL Feature Checklist
To understand which ANSI SQL and other SQL features SQream DB supports, use the tables below.
Data Types and Values
Read more about Yes data types.
Item |
Supported |
Further information |
---|---|---|
|
Yes |
Boolean values |
|
Yes |
Unsigned 1 byte integer (0 - 255) |
|
Yes |
2 byte integer (-32,768 - 32,767) |
|
Yes |
4 byte integer (-2,147,483,648 - 2,147,483,647) |
|
Yes |
8 byte integer (-9,223,372,036,854,775,808 - 9,223,372,036,854,775,807) |
|
Yes |
4 byte floating point |
|
Yes |
8 byte floating point |
|
Yes |
Fixed-point numbers. |
|
Yes |
Variable length string - UTF-8 encoded |
|
Yes |
Date |
|
Yes |
Date and time |
|
Yes |
|
|
No |
Can be stored as a text string or as part of a |
Constraints
Item |
Supported |
Further information |
---|---|---|
|
Yes |
|
|
Yes |
|
|
Yes (different name) |
|
Transactions
SQream DB treats each statement as an auto-commit transaction. Each transaction is isolated from other transactions with serializable isolation.
If a statement fails, the entire transaction is canceled and rolled back. The database is unchanged.
Indexes
SQream DB has a range-index collected on all columns as part of the metadata collection process.
SQream DB does not support explicit indexing, but does support clustering keys.
Read more about clustering keys and our metadata system.
Schema Changes
Item |
Supported |
Further information |
---|---|---|
|
Yes |
ALTER TABLE - Add column, alter column, drop column, rename column, rename table, modify clustering keys |
Rename database |
No |
|
Rename table |
Yes |
|
Rename column |
Yes |
|
Add column |
Yes |
|
Remove column |
Yes |
|
Alter column data type |
No |
|
Add / modify clustering keys |
Yes |
|
Drop clustering keys |
Yes |
|
Add / Remove constraints |
No |
|
Rename schema |
Yes |
|
Drop schema |
Yes |
|
Alter default schema per user |
Yes |
Statements
Item |
Supported |
Further information |
---|---|---|
SELECT |
Yes |
|
CREATE TABLE |
Yes |
|
CREATE FOREIGN / EXTERNAL TABLE |
Yes |
|
DELETE |
Yes |
|
INSERT |
Yes |
|
TRUNCATE |
Yes |
|
UPDATE |
Yes |
|
VALUES |
Yes |
Clauses
Item |
Supported |
Further information |
---|---|---|
|
Yes |
|
|
No |
|
|
Yes |
|
|
Yes |
|
|
Yes |
Table Expressions
Item |
Supported |
Further information |
---|---|---|
Tables, Views |
Yes |
|
Aliases, |
Yes |
|
|
Yes |
|
Table expression subqueries |
Yes |
|
Scalar subqueries |
No |
Scalar Expressions
Read more about Scalar expressions.
Item |
Supported |
Further information |
---|---|---|
Common functions |
Yes |
|
Comparison operators |
Yes |
|
Boolean operators |
Yes |
|
Conditional expressions |
Yes |
|
Conditional functions |
Yes |
|
Pattern matching |
Yes |
|
REGEX POSIX pattern matching |
Yes |
|
|
No |
|
|
Partial |
Literal values only |
Bitwise arithmetic |
Yes |
|
Permissions
Read more about Access Control in SQream DB.
Item |
Supported |
Further information |
---|---|---|
Roles as users and groups |
Yes |
|
Object default permissions |
Yes |
|
Column / Row based permissions |
No |
|
Object ownership |
No |
Extra Functionality
Item |
Supported |
Further information |
---|---|---|
Information schema |
Yes |
|
Views |
Yes |
|
Window functions |
Yes |
|
CTEs |
Yes |
|
Saved queries, Saved queries with parameters |
Yes |
|
Sequences |
Yes |
Data Type Guides
This section describes the following:
Converting and Casting Types
SQream supports explicit and implicit casting and type conversion. The system may automatically add implicit casts when combining different data types in the same expression. In many cases, while the details related to this are not important, they can affect the results of a query. When necessary, an explicit cast can be used to override the automatic cast added by SQream DB.
For example, the ANSI standard defines a SUM()
aggregation over an INT
column as an INT
. However, when dealing with large amounts of data this could cause an overflow.
You can rectify this by casting the value to a larger data type, as shown below:
SUM(some_int_column :: BIGINT)
SQream supports the following three data conversion types:
CAST(<value> AS <data type>)
, to convert a value from one type to another. For example,CAST('1997-01-01' AS DATE)
,CAST(3.45 AS SMALLINT)
,CAST(some_column AS TEXT)
.<value> :: <data type>
, a shorthand for theCAST
syntax. For example,'1997-01-01' :: DATE
,3.45 :: SMALLINT
,(3+5) :: BIGINT
.See the SQL functions reference for additional functions that convert from a specific value which is not an SQL type, such as FROM_UNIXTS, FROM_UNIXTSMS, etc.
Supported Casts
BOOL |
TINYINT/SMALLINT/INT/BIGINT |
REAL/FLOAT |
NUMERIC |
DATE/DATETIME |
VARCHAR/TEXT |
|
---|---|---|---|---|---|---|
BOOL |
N/A |
✓ |
✗ |
✗ |
✗ |
✓ |
TINYINT/SMALLINT/INT/BIGINT |
✓ |
N/A |
✓ |
✓ |
✗ |
✓ |
REAL/FLOAT |
✗ |
✓ |
N/A |
✓ |
✗ |
✓ |
NUMERIC |
✗ |
✓ |
✓ |
✓ |
✗ |
✓ |
DATE/DATETIME |
✗ |
✗ |
✗ |
✗ |
N/A |
✓ |
VARCHAR/TEXT |
✓ |
✓ |
✓ |
✓ |
✓ |
N/A |
Supported Data Types
The Supported Data Types page describes SQream’s supported data types:
The following table shows the supported data types.
Name |
Description |
Data Size (Not Null, Uncompressed) |
Example |
Alias |
---|---|---|---|---|
|
Boolean values ( |
1 byte |
|
|
|
Unsigned integer (0 - 255) |
1 byte |
|
NA |
|
Integer (-32,768 - 32,767) |
2 bytes |
|
NA |
|
Integer (-2,147,483,648 - 2,147,483,647) |
4 bytes |
|
|
|
Integer (-9,223,372,036,854,775,808 - 9,223,372,036,854,775,807) |
8 bytes |
|
|
|
Floating point (inexact) |
4 bytes |
|
NA |
|
Floating point (inexact) |
8 bytes |
|
|
|
Variable length string - UTF-8 unicode |
Up to |
|
|
|
38 digits |
16 bytes |
|
|
|
Date |
4 bytes |
|
NA |
|
Date and time pairing in UTC |
8 bytes |
|
|
Note
SQream compresses all columns and types. The data size noted is the maximum data size allocation for uncompressed data.
Supported Casts
The Supported Casts section describes supported casts for the following types:
Numeric
The Numeric data type (also known as Decimal) is recommended for values that tend to occur as exact decimals, such as in Finance. While Numeric has a fixed precision of 38
, higher than REAL
(9
) or DOUBLE
(17
), it runs calculations more slowly. For operations that require faster performance, using Floating Point is recommended.
The correct syntax for Numeric is numeric(p, s)
), where p
is the total number of digits (38
maximum), and s
is the total number of decimal digits.
Numeric Examples
The following is an example of the Numeric syntax:
$ create or replace table t(x numeric(20, 10), y numeric(38, 38));
$ insert into t values(1234567890.1234567890, 0.12324567890123456789012345678901234567);
$ select x + y from t;
The following table shows information relevant to the Numeric data type:
Description |
Data Size (Not Null, Uncompressed) |
Example |
---|---|---|
38 digits |
16 bytes |
|
Numeric supports the following operations:
All join types.
All aggregation types (not including Window functions).
Scalar functions (not including some trigonometric and logarithmic functions).
Boolean
The following table describes the Boolean data type.
Values |
Syntax |
Data Size (Not Null, Uncompressed) |
---|---|---|
|
When loading from CSV, |
1 byte, but resulting average data sizes may be lower after compression. |
Boolean Examples
The following is an example of the Boolean syntax:
CREATE TABLE animals (name TEXT, is_angry BOOL);
INSERT INTO animals VALUES ('fox',true), ('cat',true), ('kiwi',false);
SELECT name, CASE WHEN is_angry THEN 'Is really angry!' else 'Is not angry' END FROM animals;
The following is an example of the correct output:
"fox","Is really angry!"
"cat","Is really angry!"
"kiwi","Is not angry"
Boolean Casts and Conversions
The following table shows the possible Boolean value conversions:
Type |
Details |
---|---|
|
|
|
|
Integer
Integer data types are designed to store whole numbers.
For more information about identity sequences (sometimes called auto-increment or auto-numbers), see Identity.
Integer Types
The following table describes the Integer types.
Name |
Details |
Data Size (Not Null, Uncompressed) |
Example |
---|---|---|---|
|
Unsigned integer (0 - 255) |
1 byte |
|
|
Integer (-32,768 - 32,767) |
2 bytes |
|
|
Integer (-2,147,483,648 - 2,147,483,647) |
4 bytes |
|
|
Integer (-9,223,372,036,854,775,808 - 9,223,372,036,854,775,807) |
8 bytes |
|
The following table describes the Integer data type.
Syntax |
Data Size (Not Null, Uncompressed) |
---|---|
An integer can be entered as a regular literal, such as |
Integer types range between 1, 2, 4, and 8 bytes - but resulting average data sizes could be lower after compression. |
Integer Examples
The following is an example of the Integer syntax:
CREATE TABLE cool_numbers (a INT NOT NULL, b TINYINT, c SMALLINT, d BIGINT);
INSERT INTO cool_numbers VALUES (1,2,3,4), (-5, 127, 32000, 45000000000);
SELECT * FROM cool_numbers;
The following is an example of the correct output:
1,2,3,4
-5,127,32000,45000000000
Integer Casts and Conversions
The following table shows the possible Integer value conversions:
Type |
Details |
---|---|
|
|
|
|
Floating Point
The Floating Point data types (REAL
and DOUBLE
) store extremely close value approximations, and are therefore recommended for values that tend to be inexact, such as Scientific Notation. While Floating Point generally runs faster than Numeric, it has a lower precision of 9
(REAL
) or 17
(DOUBLE
) compared to Numeric’s 38
. For operations that require a higher level of precision, using Numeric is recommended.
The floating point representation is based on IEEE 754.
Floating Point Types
The following table describes the Floating Point data types.
Name |
Details |
Data Size (Not Null, Uncompressed) |
Example |
---|---|---|---|
|
Single precision floating point (inexact) |
4 bytes |
|
|
Double precision floating point (inexact) |
8 bytes |
|
The following table shows information relevant to the Floating Point data types.
Aliases |
Syntax |
Data Size (Not Null, Uncompressed) |
---|---|---|
|
A double precision floating point can be entered as a regular literal, such as |
Floating point types are either 4 or 8 bytes, but size could be lower after compression. |
Floating Point Examples
The following are examples of the Floating Point syntax:
CREATE TABLE cool_numbers (a REAL NOT NULL, b DOUBLE);
INSERT INTO cool_numbers VALUES (1,2), (3.14159265358979, 2.718281828459);
SELECT * FROM cool_numbers;
1.0,2.0
3.1415927,2.718281828459
Note
Most SQL clients control display precision of floating point numbers, and values may appear differently in some clients.
Floating Point Casts and Conversions
The following table shows the possible Floating Point value conversions:
Type |
Details |
---|---|
|
|
|
|
Note
As shown in the above examples, casting real
to int
rounds down.
String
TEXT
is designed for storing text or strings of characters.
SQream UTF-8 representations (TEXT
).
Length
When using TEXT
, specifying a size is optional. If not specified, the text field carries no constraints. To limit the size of the input, use TEXT(n)
, where n
is the permitted number of characters.
The following apply to setting the String type length:
If the data exceeds the column length limit on
INSERT
orCOPY
operations, SQream DB will return an error.When casting or converting, the string has to fit in the target. For example,
'Kiwis are weird birds' :: TEXT(5)
will return an error. UseSUBSTRING
to truncate the length of the string.
Syntax
String types can be written with standard SQL string literals, which are enclosed with single quotes, such as
'Kiwi bird'
. To include a single quote in the string, use double quotations, such as 'Kiwi bird''s wings are tiny'
. String literals can also be dollar-quoted with the dollar sign $
, such as $$Kiwi bird's wings are tiny$$
is the same as 'Kiwi bird''s wings are tiny'
.
Size
TEXT(n)
can occupy up to 4*n bytes. However, the size of strings is variable and is compressed by SQream.
String Examples
The following is an example of the String syntax:
CREATE TABLE cool_strings (a TEXT NOT NULL, b TEXT);
INSERT INTO cool_strings VALUES ('hello world', 'Hello to kiwi birds specifically');
INSERT INTO cool_strings VALUES ('This is ASCII only', 'But this column can contain 中文文字');
SELECT * FROM cool_strings;
The following is an example of the correct output:
hello world ,Hello to kiwi birds specifically
This is ASCII only,But this column can contain 中文文字
Note
Most clients control the display precision of floating point numbers, and values may appear differently in some clients.
String Casts and Conversions
The following table shows the possible String value conversions:
Type |
Details |
---|---|
|
|
|
|
|
|
|
Requires a supported format, such as |
Date
DATE
is a type designed for storing year, month, and day. DATETIME
is a type designed for storing year, month, day, hour, minute, seconds, and milliseconds in UTC with 1 millisecond precision.
Date Types
The following table describes the Date types:
Name |
Details |
Data Size (Not Null, Uncompressed) |
Example |
---|---|---|---|
|
Date |
4 bytes |
|
|
Date and time pairing in UTC |
8 bytes |
|
Aliases
DATETIME
is also known as TIMESTAMP
or DATETIME2
.
Syntax
DATE
values are formatted as string literals.
The following is an example of the DATETIME syntax:
'1955-11-05'
date '1955-11-05'
DATETIME
values are formatted as string literals conforming to ISO 8601.
The following is an example of the DATETIME syntax:
'1955-11-05 01:26:00'
SQream attempts to guess if the string literal is a date or datetime based on context, for example when used in date-specific functions.
Size
A DATE
column is 4 bytes in length, while a DATETIME
column is 8 bytes in length.
However, the size of these values is compressed by SQream DB.
Date Examples
The following is an example of the Date syntax:
CREATE TABLE important_dates (a DATE, b DATETIME);
INSERT INTO important_dates VALUES ('1997-01-01', '1955-11-05 01:24');
SELECT * FROM important_dates;
The following is an example of the correct output:
1997-01-01,1955-11-05 01:24:00.0
The following is an example of the Datetime syntax:
SELECT a :: DATETIME, b :: DATE FROM important_dates;
The following is an example of the correct output:
1997-01-01 00:00:00.0,1955-11-05
Warning
Some client applications may alter the DATETIME
value by modifying the timezone.
Date Casts and Conversions
The following table shows the possible DATE
and DATETIME
value conversions:
Type |
Details |
---|---|
|
|
Release Notes
Version |
Release Date |
---|---|
March 01, 2023 |
|
January 25, 2023 |
Release Notes 4.0
The 4.0 Release Notes describe the following releases:
Release Notes 4.0
SQream is introducing a new version release system that follows the more commonly used Major.Minor versioning schema. The newly released 4.0 version is a minor version upgrade and does not require considerable preparation.
The 4.0 release notes were released on 01/25/2023 and describe the following:
New Features
Re-enabling an enhanced version of the License Storage Capacity feature
Lightweight Directory Access Protocol(LDAP) may be used to authenticate SQream roles
Physical deletion performance enhancement by supporting file systems with parallelism capabilities
Storage Version
The storage version presently in effect is version 45.
SQream Studio Updates and Improvements
When creating a New Role, you may now create a group role by selecting Set as a group role.
When editing an Existing Role, you are no longer obligated to update the role’s password.
Known Issues
Percentile is not supported for Window functions.
Version 4.0 resolved Issues
SQ No. |
Description |
---|---|
SQ-10544 |
SQream Studio dashboard periodic update enhancement |
SQ-11296 |
Slow catalog queries |
SQ-11772 |
Slow query performance when using |
SQ-12318 |
JDBC |
SQ-12364 |
|
SQ-12446 |
SQream Studio group role modification issue |
SQ-12468 |
Internal compiler error |
SQ-12580 |
Server Picker GPU dependency |
SQ-12598 |
Executing |
SQ-12652 |
SQream Studio result panel adjustment |
SQ-13055 |
NULL issue when executing query with pysqream |
Configuration Changes
No configuration changes were made.
Naming Changes
No relevant naming changes were made.
Deprecated Features
SQream is declaring end of support of VARCHAR data type, the decision resulted by SQream’s effort to enhance its core functionalities and with respect to ever changing echo system requirements.
VARCHAR is no longer supported for new customers - effective from Version 2022.1.3 (September 2022).
TEXT data type is replacing VARCHAR and NVARCHAR - SQream will maintain VARCHAR data type support until 09/30/2023.
End of Support
No End of Support changes were made.
Upgrading to version 4.0
Generate a back-up of the metadata by running the following command:
$ select backup_metadata('out_path');
Tip
SQream recommends storing the generated back-up locally in case needed.
SQream runs the Garbage Collector and creates a clean backup tarball package.
Shut down all SQream services.
Extract the recently created back-up file.
Replace your current metadata with the metadata you stored in the back-up file.
Navigate to the new SQream package bin folder.
Run the following command:
$ ./upgrade_storage <levelDB path>
Note
Upgrading from a major version to another major version requires you to follow the Upgrade Storage step. This is described in Step 7 of the Upgrading SQream Version procedure.
Release Notes 4.1
SQream is introducing a new version release system that follows the more commonly used Major.Minor.Patch versioning schema. The newly released 4.0 version is a minor version upgrade and does not require considerable preparation.
The 4.1 release notes were released on 03/01/2023 and describe the following:
New Features
Lightweight Directory Access Protocol (LDAP) management enhancement
A new brute-force attack protection mechanism locks out user accounts for 15 minutes following 5 consecutive failed login attempts
Newly Released Connector Drivers
JDBC 4.5.7 .jar file
Storage Version
The storage version presently in effect is version 45.
SQream Studio Updates and Improvements
SQream Studio v5.5.4 has been released.
Known Issues
Percentile is not supported for Window functions.
Version 4.1 resolved Issues
SQ No. |
Description |
---|---|
SQ-11287 |
Function definition SQL UDF parenthesis issue |
SQ-11296 |
Slow catalog queries |
SQ-12255 |
Text column additional characters when using |
SQ-12510 |
Encryption memory issues |
SQ-13219 |
JDBC |
Configuration Changes
No configuration changes
Naming Changes
No naming changes
Deprecated Features
► Square Brackets []
The []
, which are frequently used to delimit identifiers such as column names, table names, and other database objects, will soon be deprecated to facilitate the use of the ARRAY
data type.
Support in
[]
for delimiting database object identifiers ends on June 1st, 2023.To delimit database object identifiers, you will be able to use double quotes
""
.
► VARCHAR
The VARCHAR
data type is deprecated to improve the core functionalities of the platform and to align with the constantly evolving ecosystem requirements.
Support in the
VARCHAR
data type ends at September 30th, 2023.VARCHAR
is no longer supported for new customers, effective from Version 2022.1.3.The
TEXT
data type is replacing theVARCHAR
andNVARCHAR
data types.
End of Support
No End of Support changes were made.
Upgrading to v4.1
Generate a back-up of the metadata by running the following command:
$ select backup_metadata('out_path');
Tip
SQream recommends storing the generated back-up locally in case needed.
SQream runs the Garbage Collector and creates a clean backup tarball package.
Shut down all SQream services.
Copy the recently created back-up file.
Replace your current metadata with the metadata you stored in the back-up file.
Navigate to the new SQream package bin folder.
Run the following command:
$ ./upgrade_storage <levelDB path>
Note
Upgrading from a major version to another major version requires you to follow the Upgrade Storage step. This is described in Step 7 of the Upgrading SQream Version procedure.
Release Notes 2022.1
The 2022.1 Release Notes describe the following releases:
Release Notes 2022.1.7
The 2022.1.7 release notes were released on 12/15/2022 and describe the following:
New Features
Ingesting data from JSON files.
ZLIB compression performance enhancements.
Storage Version
The storage version presently in effect is version 43.
Known Issues
Percentile is not supported for Window functions.
Version 2022.1.7 resolved Issues
SQ No. |
Description |
---|---|
SQ-11523 |
|
SQ-11811 |
Missing metadata optimization when joining |
SQ-12178 |
SQreamNet does not support the |
Configuration Changes
No configuration changes were made.
Naming Changes
No relevant naming changes were made.
Deprecated Features
SQream is declaring end of support of VARCHAR data type, the decision resulted by SQream’s effort to enhance its core functionalities and with respect to ever changing echo system requirements.
VARCHAR is no longer supported for new customers - effective from Version 2022.1.3 (September 2022).
TEXT data type is replacing VARCHAR and NVARCHAR - SQream will maintain VARCHAR data type support until 09/30/2023.
End of Support
No End of Support changes were made.
Upgrading to v2022.1.7
Generate a back-up of the metadata by running the following command:
$ select backup_metadata('out_path');
Tip
SQream recommends storing the generated back-up locally in case needed.
SQream runs the Garbage Collector and creates a clean backup tarball package.
Shut down all SQream services.
Extract the recently created back-up file.
Replace your current metadata with the metadata you stored in the back-up file.
Navigate to the new SQream package bin folder.
Run the following command:
$ ./upgrade_storage <levelDB path>
Note
Upgrading from a major version to another major version requires you to follow the Upgrade Storage step. This is described in Step 7 of the Upgrading SQream Version procedure.
Release Notes 2022.1.6
The 2022.1.6 release notes were released on 12/11/2022 and describe the following:
New Features
.Net Driver now supports .NET version 6 or newer.
Storage Version
The storage version presently in effect is version 42.
Known Issues
Percentile is not supported for Window functions.
Version 2022.1.6 resolved Issues
SQ No. |
Description |
---|---|
SQ-10160 |
Spotfire casting issues when reading SQream data |
SQ-11295 |
|
SQ-11940, SQ-11926, SQ-11874 |
Known encryption issues |
SQ-11975 |
Internal runtime error |
SQ-12019 |
Using |
SQ-12089 |
|
SQ-12117 |
Running TCPH-21 results in out of memory |
SQ-12204 |
Possible issue when trying to INSERT Unicode data using .Net client |
Configuration Changes
No configuration changes were made.
Naming Changes
No relevant naming changes were made.
Deprecated Features
SQream is declaring end of support of VARCHAR data type, the decision resulted by SQream’s effort to enhance its core functionalities and with respect to ever changing echo system requirements.
VARCHAR is no longer supported for new customers - effective from Version 2022.1.3 (September 2022).
TEXT data type is replacing VARCHAR and NVARCHAR - SQream will maintain VARCHAR data type support until 09/30/2023.
End of Support
No End of Support changes were made.
Upgrading to v2022.1.6
Generate a back-up of the metadata by running the following command:
$ select backup_metadata('out_path');
Tip
SQream recommends storing the generated back-up locally in case needed.
SQream runs the Garbage Collector and creates a clean backup tarball package.
Shut down all SQream services.
Extract the recently created back-up file.
Replace your current metadata with the metadata you stored in the back-up file.
Navigate to the new SQream package bin folder.
Run the following command:
$ ./upgrade_storage <levelDB path>
Note
Upgrading from a major version to another major version requires you to follow the Upgrade Storage step. This is described in Step 7 of the Upgrading SQream Version procedure.
Release Notes 2022.1.5
The 2022.1.5 release notes were released on 11/02/2022 and describe the following:
New Features
The 2022.1.5 Release Notes include the following new features:
keys_evaluate
utility function enhancement - add problematic chunk ID to the function’s output reportAutomatically close database client connections that have been open for 24 hours without any active statements
release_defunct_locks
utility function enhancement to receive new optional input parameter to specify timeout - for more details see Lock Related Issues.Metadata scale up process improvement through RocksDB configuration improvements
Storage Version
The storage version presently in effect is version 42.
Known Issues
Recently discovered issue with the encryption feature, at this time SQream recommends to avoid using this feature - a fix will be introduced in the near future.
Resolved Issues
The following table lists the issues that were resolved in Version 2022.1.5:
SQ No. |
Description |
---|---|
SQ-11081 |
Tableau connection are not getting closed |
SQ-11473 |
SQream Command Line Interface connectivity issues |
SQ-11551 |
SQream Studio Logs pages filtering issues |
SQ-11631 |
Log related configuration flags are not working as expected |
SQ-11745 |
Missing validation of sufficient GPU memory |
SQ-11792 |
CUME_DIST function causes query execution errors |
SQ-11905 |
GetDate casting to as text returns DATE with 0s in the time part or no time part at all |
SQ-12580 |
Server Picker and Meta Data server may not be deployed on servers without GPU |
SQ-12690 |
Worker thread increase |
SQ-13775 |
Worker down issue |
SQ-13947 |
Non-Unicode character query execution error |
Operations and Configuration Changes
No configuration changes were made.
Naming Changes
No relevant naming changes were made.
Deprecated Features
SQream is declaring end of support of VARCHAR data type, the decision resulted by SQream’s effort to enhance its core functionalities and with respect to ever changing echo system requirements.
VARCHAR is no longer supported for new customers - effective from Version 2022.1.3 (September 2022).
TEXT data type is replacing VARCHAR - SQream will maintain VARCHAR data type support until 09/30/2023.
End of Support
No End of Support changes were made.
Upgrading to v2022.1.5
Generate a back-up of the metadata by running the following command:
$ select backup_metadata('out_path');
Tip
SQream recommends storing the generated back-up locally in case needed.
SQream runs the Garbage Collector and creates a clean backup tarball package.
Shut down all SQream services.
Extract the recently created back-up file.
Replace your current metadata with the metadata you stored in the back-up file.
Navigate to the new SQream package bin folder.
Run the following command:
$ ./upgrade_storage <levelDB path>
Note
Upgrading from a major version to another major version requires you to follow the Upgrade Storage step. This is described in Step 7 of the Upgrading SQream Version procedure.
Release Notes 2022.1.4
The 2022.1.4 release notes were released on 10/11/2022 and describe the following:
Version Content
The 2022.1.4 Release Notes describes the following:
Security enhancement - Disable Python UDFs by default.
Storage Version
The storage version presently in effect is version 42.
Known Issues
No relevant Known Issues.
Resolved Issues
The following table lists the issues that were resolved in Version 2022.1.4:
SQ No. |
Description |
---|---|
SQ-11782 |
Alter default permissions to grant update results in error |
SQ-11740 |
A correlated subquery is blocked when having ‘not exist’ where clause in update query |
SQ-11686, SQ-11584 |
CUDA malloc error |
SQ-10602 |
Group by clause error |
SQ-9813 |
When executing copy from a parquet file that contain date values earlier than 1970, values are changed to 1970. |
Operations and Configuration Changes
No configuration changes were made.
Naming Changes
No relevant naming changes were made.
Deprecated Features
SQream is declaring end of support of VARCHAR data type, the decision resulted by SQream’s effort to enhance its core functionalities and with respect to ever changing echo system requirements.
VARCHAR is no longer supported for new customers - effective from Version 2022.1.3 (September 2022).
TEXT data type is replacing VARCHAR - SQream will maintain VARCHAR data type support until 09/30/2023.
End of Support
No End of Support changes were made.
Upgrading to v2022.1.4
Generate a back-up of the metadata by running the following command:
$ select backup_metadata('out_path');
Tip
SQream recommends storing the generated back-up locally in case needed.
SQream runs the Garbage Collector and creates a clean backup tarball package.
Shut down all SQream services.
Extract the recently created back-up file.
Replace your current metadata with the metadata you stored in the back-up file.
Navigate to the new SQream package bin folder.
Run the following command:
$ ./upgrade_storage <levelDB path>
Note
Upgrading from a major version to another major version requires you to follow the Upgrade Storage step. This is described in Step 7 of the Upgrading SQream Version procedure.
Release Notes 2022.1.3
The 2022.1.3 release notes were released on 9/20/2022 and describe the following:
Version Content
The 2022.1.3 Release Notes describes the following:
Optimize the delete operation by removing redundant calls.
Support LIKE condition for filtering metadata.
Migration tool for converting VARCHAR columns into TEXT columns.
Support sub-queries in the UPDATE condition.
Storage Version
The storage version presently in effect is version 42.
Known Issues
The following table lists the issues that are known limitations in Version 2022.1.3:
SQ No. |
Description |
---|---|
SQ-11677 |
UPADTE or DELETE using a sub-query that includes ‘%’ (modulo) is crashing SQreamDB worker |
Resolved Issues
The following table lists the issues that were resolved in Version 2022.1.3:
SQ No. |
Description |
---|---|
SQ-11487 |
COPY FROM with offset = 0 (which is an unsupported option) is stuck up to the query timeout. |
SQ-11373 |
SQL statement fails after changing the foreign table the statement tries to query. |
SQ-11320 |
Locked users are not being released on system reset. |
SQ-11310 |
Using “create table like” on foreign tables results in flat compression of the created table. |
SQ-11287 |
SQL User Defined Function fails when function definition contain parenthesis |
SQ-11187 |
FLAT compression is wrongly chosen when dealing with data sets starting with all-nulls |
SQ-10892 |
Update - enhanced error message when trying to run update on foreign table. |
Operations and Configuration Changes
No configuration changes were made.
Naming Changes
No relevant naming changes were made.
Deprecated Features
SQream is declaring end of support of VARCHAR data type, the decision resulted by SQream’s effort to enhance its core functionalities and with respect to ever changing echo system requirements.
VARCHAR is no longer supported for new customers - effective immediately.
TEXT data type is replacing VARCHAR - SQream will maintain VARCHAR data type support until 09/30/2023.
As part of this release 2022.1.3, SQream provides an automated and secured migration tool to help customers with the conversion phase from VARCHAR to TEXT data type, please address delivery for further information.
End of Support
No End of Support changes were made.
Upgrading to v2022.1.3
Generate a back-up of the metadata by running the following command:
$ select backup_metadata('out_path');
Tip
SQream recommends storing the generated back-up locally in case needed.
SQream runs the Garbage Collector and creates a clean backup tarball package.
Shut down all SQream services.
Extract the recently created back-up file.
Replace your current metadata with the metadata you stored in the back-up file.
Navigate to the new SQream package bin folder.
Run the following command:
$ ./upgrade_storage <levelDB path>
Note
Upgrading from a major version to another major version requires you to follow the Upgrade Storage step. This is described in Step 7 of the Upgrading SQream Version procedure.
Release Notes 2022.1.2
The 2022.1.2 release notes were released on 8/24/2022 and describe the following:
Version Content
The 2022.1.2 Release Notes describes the following:
Automatic schema identification.
Optimized queries on external Parquet tables.
Storage Version
The storage version presently in effect is version 41.
New Features
The 2022.1.2 Release Notes include the following new features:
Parquet Read Optimization
Querying Parquet foreign tables has been optimized and is now up to 20x faster than in previous versions.
Resolved Issues
The following table lists the issues that were resolved in Version 2022.1.2:
SQ No. |
Description |
---|---|
SQ-10892 |
An incorrect error message was displayed when users ran the |
SQ-11273 |
Clustering optimization only occurs when copying data from CSV files. |
Operations and Configuration Changes
No configuration changes were made.
Naming Changes
No relevant naming changes were made.
Deprecated Features
No features were deprecated for Version 2022.1.2.
End of Support
The End of Support section is not relevant to Version 2022.1.2.
Upgrading to v2022.1.2
Generate a back-up of the metadata by running the following command:
$ select backup_metadata('out_path');
Tip
SQream recommends storing the generated back-up locally in case needed.
SQream runs the Garbage Collector and creates a clean backup tarball package.
Shut down all SQream services.
Extract the recently created back-up file.
Replace your current metadata with the metadata you stored in the back-up file.
Navigate to the new SQream package bin folder.
Run the following command:
$ ./upgrade_storage <levelDB path>
Note
Upgrading from a major version to another major version requires you to follow the Upgrade Storage step. This is described in Step 7 of the Upgrading SQream Version procedure.
Release Notes 2022.1.1
The 2022.1.1 release notes were released on 7/19/2022 and describe the following:
Version Content
The 2022.1.1 Release Notes describes the following:
Enhanced security features
For more information, see SQream Acceleration Studio 5.4.7.
Storage Version
The storage version presently in effect is version 40.
New Features
The 2022.1.1 Release Notes include the following new features:
Password Security Compliance
In compliance with GDPR standards, SQream now requires a strong password policy when accessing the CLI or Studio.
For more information, see Password Policy.
Known Issues
There were no known issues in Version 2022.1.1.
Resolved Issues
The following table lists the issues that were resolved in Version 2022.1.1:
SQ No. |
Description |
---|---|
SQ-6419 |
An internal compiler error occurred when casting Numeric literals in an aggregation function. |
SQ-10873 |
Inserting 100K bytes into a text column resulted in an unclear error message. |
SQ-10955 |
Unneeded reads were occurring when filtering by date. |
Operations and Configuration Changes
The login_max_retries
configuration flag is required for adjusting the permitted log-in attempts.
For more information, see Adjusting the Permitted Log-In Attempts.
Naming Changes
No relevant naming changes were made.
Deprecated Features
In SQream Acceleration Studio 5.4.7, the Configuration section has been temporarily disabled and will be enabled at a later date. In addition, the Log Lines tab in the Log section has been removed.
End of Support
The End of Support section is not relevant to Version 2022.1.1.
Upgrading to v2022.1.1
Generate a back-up of the metadata by running the following command:
$ select backup_metadata('out_path');
Tip
SQream recommends storing the generated back-up locally in case needed.
SQream runs the Garbage Collector and creates a clean backup tarball package.
Shut down all SQream services.
Extract the recently created back-up file.
Replace your current metadata with the metadata you stored in the back-up file.
Navigate to the new SQream package bin folder.
Run the following command:
$ ./upgrade_storage <levelDB path>
Note
Upgrading from a major version to another major version requires you to follow the Upgrade Storage step. This is described in Step 7 of the Upgrading SQream Version procedure.
Release Notes 2022.1
The 2022.1 release notes were released on 7/19/2022 and describe the following:
Version Content
The 2022.1 Release Notes describe the following:
Enhanced security features.
New data manipulation command.
Additional data ingestion format.
Storage Version
The storage version presently in effect is version 40.
New Features
The 2022.1 Release Notes include the following new features:
Data Encryption
SQream now supports data encryption mechanisms in accordance with General Data Protection Regulation (GDPR) standards.
Using the data encryption feature may lead to a maximum of a 10% increase in performance degradation.
For more information, see Data Encryption.
Update Feature
SQream now supports the DML Update feature, which is used for modifying the value of certain columns in existing rows.
For more information, see UPDATE.
Avro Ingestion
SQream now supports ingesting data from Avro files.
For more information, see Inserting Data from Avro.
Known Issues
The following table lists the known issues for Version 2022.1:
SQ No. |
Description |
---|---|
SQ-7732 |
Reading numeric columns from an external Parquet file generated an error. |
SQ-9889 |
Running a query including Thai characters generated an internal runtime error. |
SQ-10071 |
Error on existing subqueries with TEXT and VARCHAR equality condition |
SQ-10191 |
The |
SQ-10629 |
Inserting data into a table significantly slowed down running queries. |
SQ-10659 |
Using a comment generated a compile error. |
Resolved Issues
The following table lists the issues that were resolved in Version 2022.1:
SQ No. |
Description |
---|---|
SQ-10111 |
Reading numeric columns from an external Parquet file generated an error. |
Operations and Configuration Changes
No relevant operations and configuration changes were made.
Naming Changes
No relevant naming changes were made.
Deprecated Features
In SQream version 2022.1 the VARCHAR
data type has been deprecated and replaced with TEXT
. SQream will maintain VARCHAR
in all previous versions until completing the migration to TEXT
, at which point it will be deprecated in all earlier versions. SQream also provides an automated and secure tool to facilitate and simplify migration from VARCHAR
to TEXT
.
If you are using an earlier version of SQream, see the Using Legacy String Literals configuration flag.
End of Support
The End of Support section is not relevant to Version 2022.1.
Upgrading to v2022.1
Generate a backup of the metadata by running the following command:
$ select backup_metadata('out_path', 'single_file');
Tip
SQream recommends storing the generated backup locally in case needed.
SQream runs the Garbage Collector and creates a clean backup tarball package.
Shut down all SQream services.
Extract the recently created backup file.
Replace your current metadata with the metadata you stored in the backup file.
Navigate to the new SQream package bin folder.
Run the following command:
$ ./upgrade_storage <levelDB path>
Note
Upgrading from a major version to another major version requires you to follow the Upgrade Storage step. This is described in Step 7 of the Upgrading SQream Version procedure.
Troubleshooting
The Troubleshooting page describes solutions to the following issues:
Remedying Slow Queries
This page describes how to troubleshoot the causes of slow queries.
Slow queries may be the result of various factors, including inefficient query practices, suboptimal table designs, or issues with system resources. If you’re experiencing sluggish query performance, it’s essential to diagnose and address the underlying causes promptly.
- Step 1: A single query is slow
If a query isn’t performing as you expect, follow the Query best practices part of the Optimization and Best Practices guide.
If all queries are slow, continue to step 2.
- Step 2: All queries on a specific table are slow
If all queries on a specific table aren’t performing as you expect, follow the Table design best practices part of the Optimization and Best Practices guide.
Check for active delete predicates in the table. Consult the Deleting Data guide for more information.
If the problem spans all tables, continue to step 3.
- Step 3: Check that all workers are up
Use
SELECT show_cluster_nodes();
to list the active cluster workers.If the worker list is incomplete, locate and start the missing worker(s).
If all workers are up, continue to step 4.
- Step 4: Check that all workers are performing well
Identify if a specific worker is slower than others by running the same query on different workers. (e.g. by connecting directly to the worker or through a service queue)
If a specific worker is slower than others, investigate performance issues on the host using standard monitoring tools (e.g.
top
).Restart SQream DB workers on the problematic host.
If all workers are performing well, continue to step 5.
- Step 5: Check if the workload is balanced across all workers
Run the same query several times and check that it appears across multiple workers (use
SELECT show_server_status()
to monitor)If some workers have a heavier workload, check the service queue usage. Refer to the Workload Manager guide.
If the workload is balanced, continue to step 6.
- Step 6: Check if there are long running statements
Identify any currently running statements (use
SELECT show_server_status()
to monitor)If there are more statements than available resources, some statements may be in an
In queue
mode.If there is a statement that has been running for too long and is blocking the queue, consider stopping it (use
SELECT stop_statement(<statement id>)
).
If the statement does not stop correctly, contact SQream Support.
If there are no long running statements or this does not help, continue to step 7.
- Step 7: Check if there are active locks
Use
SELECT show_locks()
to list any outstanding locks.If a statement is locking some objects, consider waiting for that statement to end or stop it.
If after a statement is completed the locks don’t free up, refer to the Concurrency and Locks guide.
If performance does not improve after the locks are released, continue to step 8.
- Step 8: Check free memory across hosts
Check free memory across the hosts by running
$ free -th
from the terminal.If the machine has less than 5% free memory, consider lowering the
limitQueryMemoryGB
andspoolMemoryGB
settings. Refer to the Configuring the Spooling Feature guide.If the machine has a lot of free memory, consider increasing the
limitQueryMemoryGB
andspoolMemoryGB
settings.
If performance does not improve, contact SQream Support.
Resolving Common Issues
The Resolving Common Issues page describes how to resolve the following common issues:
Troubleshooting Cluster Setup and Configuration
Note any errors - Make a note of any error you see, or check the logs for errors you might have missed.
If SQream DB can’t start, start SQream DB on a new storage cluster, with default settings. If it still can’t start, there could be a driver or hardware issue. Contact SQream support.
Reproduce the issue with a standalone SQream DB - starting up a temporary, standalone SQream DB can isolate the issue to a configuration issue, network issue, or similar.
Reproduce on a minimal example - Start a standalone SQream DB on a clean storage cluster and try to replicate the issue if possible.
Troubleshooting Connectivity Issues
Verify the correct login credentials - username, password, and database name.
Verify the host name and port
Try connecting directly to a SQream DB worker, rather than via the load balancer
Verify that the driver version you’re using is supported by the SQream DB version. Driver versions often get updated together with major SQream DB releases.
Try connecting directly with the built in SQL client. If you can connect with the local SQL client, check network availability and firewall settings.
Troubleshooting Query Performance
Use SHOW_NODE_INFO to examine which building blocks consume time in a statement. If the query has finished, but the results are not yet materialized in the client, it could point to a problem in the application’s data buffering or a network throughput issue. Alternatively, you may also retrieve the query execution plan output using SQreamDB Studio.
If a problem occurs through a 3rd party client, try reproducing it directly with the built in SQL client. If the performance is better in the local client, it could point to a problem in the application or network connection.
Consult the Optimization and Best Practices guide to learn how to optimize queries and table structures.
Troubleshooting Query Behavior
Consult the SQL Statements and Syntax reference to verify if a statement or syntax behaves correctly. SQream DB may have some differences in behavior when compared to other databases.
If a problem occurs through a 3rd party client, try reproducing it directly with the built in SQL client. If the problem still occurs, file an issue with SQream support.
File an issue with SQream support
To file an issue, follow our Gathering Information for SQream Support guide.
Examining Logs
See the Collecting Logs and Metadata Database section of the Gathering Information for SQream Support guide for information about collecting logs for support.
Identifying Configuration Issues
The Troubleshooting Common Issues page describes how to troubleshoot the following common issues:
Starting a SQream DB temporarily (not as part of a cluster, with default settings) can be helpful in identifying configuration issues.
Example:
$ sqreamd /home/rhendricks/raviga_database 0 5000 /home/sqream/.sqream/license.enc
Tip
Using
nohup
and&
sends SQream DB to run in the background.It is safe to stop SQream DB at any time using
kill
. No partial data or data corruption should occur when using this method to stop the process.$ kill -9 $SQREAM_PID
Gathering Information for SQream Support
Getting Support and Reporting Bugs
When contacting SQream Support, we recommend reporting the following information:
What is the problem encountered?
What was the expected outcome?
How can SQream reproduce the issue?
When possible, please attach as many of the following:
Error messages or result outputs
DDL and queries that reproduce the issue
Screen captures if relevant
How SQream Debugs Issues
Reproduce
If we are able to easily reproduce your issue in our testing lab, this greatly improves the speed at which we can fix it.
Reproducing an issue consists of understanding:
What was SQream DB doing at the time?
How is the SQream DB cluster configured?
How does the schema look?
What is the query or statement that exposed the problem?
Were there any external factors? (e.g. Network disconnection, hardware failure, etc.)
See the Collecting a Reproducible Example of a Problematic Statement section ahead for information about collecting a full reproducible example.
Logs
The logs produced by SQream DB contain a lot of information that may be useful for debugging.
Look for error messages in the log and the offending statements. SQream’s support staff are experienced in correlating logs to workloads, and finding possible problems.
See the Collecting Logs and Metadata Database section ahead for information about collecting a set of logs that can be analyzed by SQream support.
Fix
Once we have a fix, this can be issued as a hotfix to an existing version, or as part of a bigger major release.
Your SQream account manager will keep you up-to-date about the status of the issue.
Collecting a Reproducible Example of a Problematic Statement
SQream DB contains an SQL utility that can help SQream support reproduce a problem with a query or statement.
This utility compiles and executes a statement, and collects the relevant data in a small database which can be used to recreate and investigate the issue.
SQL Syntax
SELECT EXPORT_REPRODUCIBLE_SAMPLE(output_path, query_stmt [, ... ])
;
output_path ::=
filepath
Parameters
Parameter |
Description |
---|---|
|
Path for the output archive. The output file will be a tarball. |
|
Statements to analyze. |
Example
SELECT EXPORT_REPRODUCIBLE_SAMPLE('/home/rhendricks', 'SELECT * FROM t', $$SELECT "Name", "Team" FROM nba$$);
Collecting Logs and Metadata Database
SQream DB comes bundled with a data collection utility and an SQL utility intended for collecting logs and additional information that can help SQream support drill down into possible issues.
See more information in the Collect logs from your cluster section of the Logging guide.
Examples
Write an archive to /home/rhendricks
, containing log files:
SELECT REPORT_COLLECTION('/home/rhendricks', 'log')
;
Write an archive to /home/rhendricks
, containing log files and metadata database:
SELECT REPORT_COLLECTION('/home/rhendricks', 'db_and_log')
;
Using the Command Line Utility:
$ ./bin/report_collection /home/rhendricks/sqream_storage /home/rhendricks db_and_log
Glossary
The following table shows the Glossary descriptions:
Term |
Description |
---|---|
Authentication |
The process of verifying identity by validating a user or role identity using a username and password. |
Authorization |
Defines the set of actions that an authenticaed role can perform after gaining access to the system. |
Catalog |
A set of views containing metadata information about objects in a database. |
Cluster |
A SQream deployment containing several workers running on one or more nodes. |
Custom connector |
When SQream is integrated with Power BI, used for running direct queries. |
Direct query |
A Power BI data extraction method that retrieves data from a remote source instead of from a local repository. |
Import |
A Power BI data extraction method that retrieves data to local repository to be visualized at a later point. |
Metadata |
SQream’s internal storage containing details about database objects. |
Node |
A machine used to run SQream workers. |
Role |
A group or a user. For more information see SQream Studio. |
Storage cluster |
The directory where SQream stores data. |
Worker |
A SQream application that responds to statements. Several workers running on one or more nodes form a cluster. |