Sizing

Concurrency and Scaling in SQreamDB

A SQreamDB cluster can execute one statement per worker process while also supporting the concurrent operation of multiple workers. Utility functions with minimal resource requirements, such as SHOW_SERVER_STATUS, SHOW LOCKS, and SHOW_NODE_INFO will be executed regardless of the workload.

Minimum Resource Required Per Worker:

Component	CPU Cores	RAM (GB)	Local Storage (GB)
Worker	8	128	10
Metadata Server	16 cores per 100 Workers	20 GB RAM for every 1 trillion rows	10
SqreamDB Acceleration Studio	16	16	50
Server Picker	1	2

Lightweight queries, such as COPY TO and Clean-Up require 64 RAM (GB).

Maximum Workers Per GPU:

GPU	Workers
NVIDIA Turing A10 (16GB)	1
NVIDIA Volta V100 (32GB)	2
NVIDIA Ampere A100 (40GB)	3
NVIDIA Ampere A100 (80GB)	6
NVIDIA Hopper H100 (80GB)	6
L40S Ada Lovelace (48GB)	4

Tip

Your GPU is not on the list? Visit SQreamDB Support for additional information.

Scaling When Data Sizes Grow

For many statements, SQreamDB scales linearly when adding more storage and querying on large data sets. It uses optimized ‘brute force’ algorithms and implementations, which don’t suffer from sudden performance cliffs at larger data sizes.

Scaling When Queries Are Queuing

SQreamDB scales well by adding more workers, GPUs, and nodes to support more concurrent statements.

What To Do When Queries Are Slow

Adding more workers or GPUs does not boost the performance of a single statement or query.

To boost the performance of a single statement, start by examining the best practices and ensure the guidelines are followed.

Adding additional RAM to nodes, using more GPU memory, and faster CPUs or storage can also sometimes help.

Spooling Configuration

\(limitQueryMemoryGB=\frac{\text{Total RAM - Internal Operation - metadata Server - Server picker}}{\text{Number of Workers}}\)

\(spoolMemoryGB=limitQueryMemoryGB - 50GB\)

The limitQueryMemoryGB flag is the total memory you’ve allocated for processing queries. In addition, the limitQueryMemoryGB defines how much total system memory is used by each worker. Note that spoolMemoryGB must bet set to less than the limitQueryMemoryGB.

Example

Setting Spool Memory

The provided examples assume a configuration with 2T of RAM, 8 workers running on 2 A100(80GB) GPUs, with 200 GB allocated for Internal Operations, Metadata Server, Server Picker, and UI.

Configuring the limitQueryMemoryGB using the Worker configuration file:

{
    "cluster": "/home/test_user/sqream_testing_temp/sqreamdb",
    "gpu":  0,
    "licensePath": "home/test_user/SQream/tests/license.enc",
    "machineIP": "127.0.0.1",
    "metadataServerIp": 127.0.0.1,
    "metadataServerPort": 3105,
    "port": 5000,
    "useConfigIP": true,
    "limitQueryMemoryGB" : 225,
}

Configuring the spoolMemoryGB using the legacy configuration file:

{
        "diskSpaceMinFreePercent": 10,
        "enableLogDebug": false,
        "insertCompressors": 8,
        "insertParsers": 8,
        "isUnavailableNode": false,
        "logBlackList": "webui",
        "logDebugLevel": 6,
        "nodeInfoLoggingSec": 60,
        "useClientLog": true,
        "useMetadataServer": true,
        "spoolMemoryGB": 175,
        "waitForClientSeconds": 18000,
        "enablePythonUdfs": true
}

Need help?

Visit SQreamDB Support for additional information.