Internals and architecture

SQream DB internals

SQream DB internals

SQream DB is built up of several components, which should look familiar when compared to other RDBMSs.

Statement compiler

The statement compiler is written in Haskell, in a modern style with lots of micropasses and stages.

Concurrency and concurrency control

SQream DB has a lot of concurrency built-in, which is centered around message passing and queues.

SQream DB has worker pools and other techniques to tune the level of concurrency, as well as a lock-based concurrency control system.

This mode of concurrency control doesn’t affect SELECT queries. Inserts only interact with the locks when there are things like DELETE or a DDL operation running.

Bulk data is not passed around in messages. Rather, the memory is shared between threads.

Transactions

SQream DB has serializable transactions, with some limitations:

Storage

The storage is split into the metadata layer and a light-weight but powerful append-only bulk data layer.

Metadata layer

The metadata layer leverages a lot of features from LevelDB, and is split into a metadata database (snapshots, multiple updates, catalog information).

LevelDB also enables some basic database style features such as snapshots and multiple updates.

Bulk data layer

The bulk data layer is a light-weight but powerful append-only bulk data layer, which is heavily focused on raw tablescan performance.

The storage is based around extent files which have compressed chunks representing a single column.

Chunks are the smallest entity, representing around 1 to 10 million rows.

SQream DB also has a background storage reorganization process,to ensure good performance after the data has been inserted. The reorganization process allows support for small, fast inserts - while still maintaining the data arranged for maximum query performance.

Building blocks

The heavy lifting in SQream DB is done by single purpose C++/CUDA building blocks.

These are purposely designed to not be smart - they have to be instructed exactly what to do.

Most of the intelligence in piecing things together is in the statement compiler.

Columnar

GPU usage