Pre-Installation Configuration
Before installing SQreamDB, it is essential that you tune your system for better performance and stability.
Basic Input/Output System Settings
The first step when setting your pre-installation configurations is to use the basic input/output system (BIOS) settings.
The BIOS settings may have a variety of names, or may not exist on your system. Each system vendor has a different set of settings and variables. It is safe to skip any and all of the configuration steps, but this may impact performance.
If any doubt arises, consult the documentation for your server or your hardware vendor for the correct way to apply the settings.
Item |
Setting |
Rationale |
---|---|---|
Management console access |
Connected |
Connection to Out-of-band (OOB) required to preserve continuous network uptime. |
All drives |
Connected and displayed on RAID interface |
Prerequisite for cluster or OS installation. |
RAID volumes |
Configured according to project guidelines. Must be rebooted to take effect. |
Clustered to increase logical volume and provide redundancy. |
Fan speed Thermal Configuration. |
Dell fan speed: High Maximum. Specified minimum setting: 60. HPe thermal configuration: Increased cooling. |
NVIDIA Tesla GPUs are passively cooled and require high airflow to operate at full performance. |
Power regulator or iDRAC power unit policy |
HPe: HP static high performance mode enabled. Dell: iDRAC power unit policy (power cap policy) disabled. |
Other power profiles (such as “balanced”) throttle the CPU and diminishes performance. Throttling may also cause GPU failure. |
System Profile, Power Profile, or Performance Profile |
High Performance |
The Performance profile provides potentially increased performance by maximizing processor frequency, and the disabling certain power saving features such as C-states. Use this setting for environments that are not sensitive to power consumption. |
Power Cap Policy or Dynamic power capping |
Disabled |
Other power profiles (like “balanced”) throttle the CPU and may diminish performance or cause GPU failure. This setting may appear together with the above (Power profile or Power regulator). This setting allows disabling system ROM power calibration during the boot process. Power regulator settings are named differently in BIOS and iLO/iDRAC. |
Intel Turbo Boost |
Enabled |
Intel Turbo Boost enables overclocking the processor to boost CPU-bound operation performance. Overclocking may risk computational jitter due to changes in the processor’s turbo frequency. This causes brief pauses in processor operation, introducing uncertainty into application processing time. Turbo operation is a function of power consumption, processor temperature, and the number of active cores. |
Intel Virtualization Technology (VT-d) |
Disable |
VT-d is optimal for running VMs. However, when running Linux natively, disabling VT-d boosts performance by up to 10%. |
Logical Processor |
HPe: Enable Hyperthreading Dell: Enable Logical Processor |
Hyperthreading doubles the amount of logical processors, which may improve performance by ~5-10% for CPU-bound operations. |
Intel Virtualization Technology (VT-d) |
Disable |
VT-d is optimal for running VMs. However, when running Linux natively, disabling VT-d boosts performance by up to 10%. |
Processor C-States (Minimum processor idle power core state) |
Disable |
Processor C-States reduce server power when the system is in an idle state. This causes slower cold-starts when the system transitions from an idle to a load state, and may reduce query performance by up to 15%. |
HPe: Energy/Performance bias |
Maximum performance |
Configures processor sub-systems for high-performance and low-latency. Other power profiles (like “balanced”) throttle the CPU and may diminish performance. Use this setting for environments that are not sensitive to power consumption. |
HPe: DIMM voltage |
Optimized for Performance |
Setting a higher voltage for DIMMs may increase performance. |
Memory Operating Mode |
Optimizer Mode, Disable Node Interleaving, Auto Memory Operating Voltage |
Memory Operating Mode is tuned for performance in Optimizer mode. Other modes may improve reliability, but reduce performance. Node Interleaving should be disabled because enabling it interleaves the memory between memory nodes, which harms NUMA-aware applications such as SQreamDB. |
HPe: Memory power savings mode |
Maximum performance |
This setting configures several memory parameters to optimize the performance of memory sub-systems. The default setting is Balanced. |
HPe ACPI SLIT |
Enabled |
ACPI SLIT sets the relative access times between processors and memory and I/O sub-systems. ACPI SLIT enables operating systems to use this data to improve performance by more efficiently allocating resources and workloads. |
QPI Snoop |
Cluster on Die or Home Snoop |
QPI (QuickPath Interconnect) Snoop lets you configure different Snoop modes that impact the QPI interconnect. Changing this setting may improve the performance of certain workloads. The default setting of Home Snoop provides high memory bandwidth in an average NUMA environment. Cluster on Die may provide increased memory bandwidth in highly optimized NUMA workloads. Early Snoop may decrease memory latency, but may result in lower overall bandwidth compared to other modes. |
Installing the Operating System
Before You Begin
Your system must have at least 200 gigabytes of free space on the root
/
mount.For a multi-node cluster, you must have external shared storage provided by systems like General Parallel File System (GPFS), Weka, or VAST.
Once the BIOS settings have been set, you must install the operating system.
A SQreamDB installation requires RHEL8.8/8.9
Verify the exact RHEL8 version with your storage vendor to avoid driver incompatibility.
Installation
Select a language (English recommended).
From Software Selection, select Minimal and check the Development Tools group checkbox.
Selecting the Development Tools group installs the following tools:
autoconf
automake
binutils
bison
flex
gcc
gcc-c++
gettext
libtool
make
patch
pkgconfig
redhat-rpm-config
rpm-build
rpm-sign
Continue the installation.
Set up the necessary drives and users as per the installation process.
The OS shell is booted up.
Configuring the Operating System
When configuring the operating system, several basic settings related to creating a new server are required. Configuring these as part of your basic set-up increases your server’s security and usability.
Creating a sqream
User
The sqream user must have the same UID and GID across all servers in your cluster.
If the sqream
user does not have the same UID and GID across all servers and there is no critical data stored under /home/sqream
, it is recommended to delete the sqream
user and sqream group from your servers. Subsequently, create new ones with the same ID, using the following command:
sudo userdel sqream sudo rm /var/spool/mail/sqream
Before adding a user with a specific UID and GID, it is crucial to verify that such Ids do not already exist.
The steps below guide you on creating a sqream
user with an exemplary ID of 1111
.
Verify that a
1111
UID does not already exists:cat /etc/passwd |grep 1111
Verify that a
1111
GID does not already exists:cat /etc/group |grep 1111
Add a user with an identical UID on all cluster nodes:
useradd -u 1111 sqream
Add a
sqream
user to thewheel
group.sudo usermod -aG wheel sqream
You can remove the
sqream
user from thewheel
group when the installation and configuration are complete:passwd sqream
Log out and log back in as
sqream
.If you deleted the
sqream
user and recreated it to have a new ID, you must change its ownership to/home/sqream
in order to avoid permission errors.sudo chown -R sqream:sqream /home/sqream
Setting Up A Locale
SQreamDB enables you to set up a locale using your own location. To find out your current time-zone, run the timedatectl list-timezones
command.
Set the language of the locale:
sudo localectl set-locale LANG=en_US.UTF-8
Installing Required Software
Installing EPEL Repository
sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
Enabling Additional Red Hat Repositories
Enabling additional Red Hat repositories is essential to install the required packages in the subsequent procedures.
sudo subscription-manager release --set=8.9 sudo subscription-manager repos --enable codeready-builder-for-rhel-8-x86_64-rpms sudo subscription-manager repos --enable rhel-8-for-x86_64-appstream-rpms sudo subscription-manager repos --enable rhel-8-for-x86_64-baseos-rpms
Installing Required Packages
sudo dnf install chrony pciutils monit zlib-devel openssl-devel kernel-devel-$(uname -r) kernel-headers-$(uname -r) gcc net-tools wget jq libffi-devel xz-devel ncurses-compat-libs libnsl gdbm-devel tk-devel sqlite-devel readline-devel texinfo
Installing Recommended Tools
sudo dnf install bash-completion.noarch vim-enhanced vim-common net-tools iotop htop psmisc screen xfsprogs wget yum-utils dos2unix
For SQreamDB version 4.4 or newer, install Python 3.9.13.
Download the Python 3.9.13 source code tarball file from the following URL into the
/home/sqream
directory:wget https://www.python.org/ftp/python/3.9.13/Python-3.9.13.tar.xz
Extract the Python 3.9.13 source code into your current directory:
tar -xf Python-3.9.13.tar.xz
Navigate to the Python 3.9.13 directory:
cd Python-3.9.13
Run the
./configure
script:./configure --enable-loadable-sqlite-extensions
Build the software:
make -j30
Install the software:
sudo make install
Verify that Python 3.9.13 has been installed:
python3 --version
Installing NodeJS
NodeJS is necessary only when the UI runs on the same server as SqreamDB. If not, you can skip this step.
Download the NodeJS source code tarball file from the following URL into the
/home/sqream
directory:wget https://nodejs.org/dist/v16.20.0/node-v16.20.0-linux-x64.tar.xz tar -xf node-v16.20.0-linux-x64.tar.xz
Move the node-v16.20.0-linux-x64 file to the /usr/local directory.
sudo mv node-v16.20.0-linux-x64 /usr/local
Navigate to the
/usr/bin/
directory:cd /usr/bin
Create a symbolic link to the
/local/node-v16.20.0-linux-x64/bin/node node
directory:sudo ln -s ../local/node-v16.20.0-linux-x64//bin/node node
Create a symbolic link to the
/local/node-v16.20.0-linux-x64/bin/npm npm
directory:sudo ln -s ../local/node-v16.20.0-linux-x64/bin/npm npm
Create a symbolic link to the
/local/node-v16.20.0-linux-x64/bin/npx npx
directory:sudo ln -s ../local/node-v16.20.0-linux-x64/bin/npx npx
Install the
pm2
process management:sudo npm install pm2 -g cd /usr/bin sudo ln -s ../local/node-v16.20.0-linux-x64/bin/pm2 pm2
If installing the
pm2
process management fails, install it offline:
On a machine with internet access, install the following:
nodejs
npm
pm2
Extract the pm2 module to the correct directory:
cd /usr/local/node-v16.20.0-linux-x64/lib/node_modules tar -czvf pm2_x86.tar.gz pm2Copy the
pm2_x86.tar.gz
file to a server without access to the internet and extract it.
Move the
pm2
folder to the/usr/local/node-v16.20.0-linux-x64/lib/node_modules
directory:sudo mv pm2 /usr/local/node-v16.20.0-linux-x64/lib/node_modules
Navigate back to the
/usr/bin
directory:cd /usr/bin
Create a symbolink to the
pm2
service:sudo ln -s /usr/local/node-v16.20.0-linux-x64/lib/node_modules/pm2/bin/pm2 pm2
Verify that installation was successful without using
sudo
:pm2 list
Verify that the node versions for the above are correct:
node --version
Configuring Chrony for RHEL8 Only
Start the Chrony service:
sudo systemctl start chronyd
Enable the Chrony service to start automatically at boot time:
sudo systemctl enable chronyd
Check the status of the Chrony service:
sudo systemctl status chronyd
Configuring the Server to Boot Without Linux GUI
We recommend that you configure your server to boot without a Linux GUI by running the following command:
sudo systemctl set-default multi-user.target
Running this command activates the NO-UI server mode.
Configuring the Security Limits
The security limits refer to the number of open files, processes, etc.
sudo bash
echo -e "sqream soft nproc 1000000\nsqream hard nproc 1000000\nsqream soft nofile 1000000\nsqream hard nofile 1000000\nroot soft nproc 1000000\nroot hard nproc 1000000\nroot soft nofile 1000000\nroot hard nofile 1000000\nsqream soft core unlimited\nsqream hard core unlimited" >> /etc/security/limits.conf
Configuring the Kernel Parameters
Insert a new line after each kernel parameter:
echo -e "vm.dirty_background_ratio = 5 \n vm.dirty_ratio = 10 \n vm.swappiness = 10 \n vm.vfs_cache_pressure = 200 \n vm.zone_reclaim_mode = 0 \n" >> /etc/sysctl.conf
Check the maximum value of the
fs.file
:sysctl -n fs.file-max
Configuring the Firewall
The example in this section shows the open ports for four sqreamd
sessions. If more than four are required, open the required ports as needed. Port 8080 in the example below is a new UI port.
The ports listed below are required, and the same logic applies to all additional SQreamDB Worker ports.
Port |
Use |
---|---|
8080 |
UI port |
443 |
UI over HTTPS ( requires nginx installation ) |
3105 |
SqreamDB metadataserver service |
3108 |
SqreamDB serverpicker service |
3109 |
SqreamDB serverpicker service over ssl |
5000 |
SqreamDB first worker default port |
5100 |
SqreamDB first worker over ssl default port |
5001 |
SqreamDB second worker default port |
5101 |
SqreamDB second worker over ssl default port |
Start the service and enable FirewallID on boot:
systemctl start firewalld
Add the following ports to the permanent firewall:
firewall-cmd --zone=public --permanent --add-port=8080/tcp firewall-cmd --zone=public --permanent --add-port=3105/tcp firewall-cmd --zone=public --permanent --add-port=3108/tcp firewall-cmd --zone=public --permanent --add-port=5000-5003/tcp firewall-cmd --zone=public --permanent --add-port=5100-5103/tcp firewall-cmd --permanent --list-all
Reload the firewall:
firewall-cmd --reload
Enable FirewallID on boot:
systemctl enable firewalld
If you do not need the firewall, you can disable it:
sudo systemctl stop firewalld sudo systemctl disable firewalld
Disabling SELinux
Disabling SELinux is a recommended action.
Show the status of
selinux
:sudo sestatus
If the output is not
disabled
, edit the/etc/selinux/config
file:sudo vim /etc/selinux/config
Change
SELINUX=enforcing
toSELINUX=disabled
:The above changes will only take effect after rebooting the server.
You can disable selinux immediately after rebooting the server by running the following command:
sudo setenforce 0
Configuring the /etc/hosts
File
Edit the
/etc/hosts
file:sudo vim /etc/hosts
Call your local host:
127.0.0.1 localhost <server1 ip> <server_name> <server2 ip> <server_name>
Installing the NVIDIA CUDA Driver
After configuring your operating system, you must install the NVIDIA CUDA driver.
Warning
If your Linux GUI runs on the server, it must be stopped before installing the CUDA drivers.
Before You Begin
Verify that the NVIDIA card has been installed and is detected by the system:
lspci | grep -i nvidia
Verify that
gcc
has been installed:gcc --version
If
gcc
has not been installed, install it for RHEL:sudo yum install -y gcc
Updating the Kernel Headers
Update the kernel headers on RHEL:
sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
Make sure kernel-devel and kernel-headers match installed kernel:
uname -r rpm -qa |grep kernel-devel-$(uname -r) rpm -qa |grep kernel-headers-$(uname -r)
Disabling Nouveau
Disable Nouveau, which is the default operating system driver.
Check if the Nouveau driver has been loaded:
lsmod | grep nouveau
If the Nouveau driver has been loaded, the command above generates output. If the Nouveau driver has not been loaded, you may skip step 2 and 3.
Blacklist the Nouveau driver to disable it:
cat <<EOF | sudo tee /etc/modprobe.d/blacklist-nouveau.conf blacklist nouveau options nouveau modeset=0 EOF
Regenerate the kernel
initramfs
directory set:
Modify the
initramfs
directory set:sudo dracut --force
Reboot the server:
sudo reboot
Installing the CUDA Driver
The current recommendation is for CUDA 12.3.2.
For questions related to which driver to install, contact SqreamDB support.
Installing the CUDA Driver from the Repository
Installing the CUDA driver from the Repository is the recommended installation method.
Install the CUDA dependencies for one of the following operating systems:
sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
(Optional) Install the CUDA dependencies from the
epel
repository:sudo yum install dkms libvdpau
Installing the CUDA depedendencies from the
epel
repository is only required for installingrunfile
.Download and install the required local repository:
RHEL8.8/8.9 CUDA 12.3.2 repository ( INTEL ) installation ( Required for H/L Series GPU models ):
wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda-repo-rhel8-12-3-local-12.3.2_545.23.08-1.x86_64.rpm sudo dnf localinstall cuda-repo-rhel8-12-3-local-12.3.2_545.23.08-1.x86_64.rpm
sudo dnf clean all sudo dnf -y module install nvidia-driver:latest-dkms
Tuning Up NVIDIA Performance
The following procedures exclusively relate to Intel.
Tune Up NVIDIA Performance when Driver Installed from the Repository
Check the service status:
sudo systemctl status nvidia-persistenced
If the service exists, it will be stopped by default.
Start the service:
sudo systemctl start nvidia-persistenced
Verify that no errors have occurred:
sudo systemctl status nvidia-persistenced
Enable the service to start up on boot:
sudo systemctl enable nvidia-persistenced
For H100/A100, add the following lines:
nvidia-persistenced
Reboot the server and run the NVIDIA System Management Interface (NVIDIA SMI):
nvidia-smi
Tune Up NVIDIA Performance when Driver Installed from the Runfile
Change the permissions on the
rc.local
file toexecutable
:sudo chmod +x /etc/rc.local
Edit the
/etc/yum.repos.d/cuda-10-1-local.repo
file:sudo vim /etc/rc.local
Add the following lines:
For H100/A100:
nvidia-persistenced
Reboot the server and run the
NVIDIA System Management Interface (NVIDIA SMI)
:nvidia-smi
Enabling Core Dumps
While this procedure is optional, SQreamDB recommends that core dumps be enabled. Note that the default abrt
format is not gdb
compatible, and that for SQreamDB support to be able to analyze your core dumps, they must be gdb
compatible.
Checking the abrtd
Status
Check if
abrtd
is running:sudo ps -ef |grep abrt
If abrtd is running, stop it:
for i in abrt-ccpp.service abrtd.service abrt-oops.service abrt-pstoreoops.service abrt-vmcore.service abrt-xorg.service ; do sudo systemctl disable $i; sudo systemctl stop $i; done
Setting the Limits
Set the limits:
ulimit -c
If the output is
0
, add the following lines to the/etc/security/limits.conf
file:* soft core unlimited * hard core unlimited
To apply the limit changes, log out and log back in.
Creating the Core Dump Directory
Because the core dump file may be the size of total RAM on the server, verify that you have sufficient disk space. In the example above, the core dump is configured to the /tmp/core_dumps
directory. If necessary, replace path according to your own environment and disk space.
Make the
/tmp/core_dumps
directory:mkdir /tmp/core_dumps
Set the ownership of the
/tmp/core_dumps
directory:sudo chown sqream.sqream /tmp/core_dumps
Grant read, write, and execute permissions to all users:
sudo chmod -R 777 /tmp/core_dumps
Setting the Output Directory on the /etc/sysctl.conf
File
Open the
/etc/sysctl.conf
file in the Vim text editor:sudo vim /etc/sysctl.conf
Add the following to the bottom of the file:
kernel.core_uses_pid = 1 kernel.core_pattern = /tmp/core_dumps/core-%e-%s-%u-%g-%p-%t fs.suid_dumpable = 2
To apply the changes without rebooting the server, run the following:
sudo sysctl -p
Check that the core output directory points to the following:
sudo cat /proc/sys/kernel/core_pattern
The following shows the correct generated output:
/tmp/core_dumps/core-%e-%s-%u-%g-%p-%t
Verifying that the Core Dumps Work
You can verify that the core dumps work only after installing and running SQreamDB. This causes the server to crash and a new core.xxx
file to be included in the folder that is written in /etc/sysctl.conf
.
Stop and restart all SQreamDB services.
Connect to SQreamDB with ClientCmd and run the following command:
select abort_server();
Verify Your SQreamDB Installation
Verify that the
sqream
user exists and has the same ID on all cluster servers.
id sqream
please verify that the storage is mounted on all cluster servers.
mount
make sure that the driver is properly installed.
nvidia-smi
Verify that the kernel file-handles allocation is greater than or equal to
2097152
:sysctl -n fs.file-max
Verify limits (run this command as a
sqream
user):ulimit -c -u -n Desired output: core file size (blocks, -c) unlimited max user processes (-u) 1000000 open files (-n) 1000000
Troubleshooting Core Dumping
This section describes the troubleshooting procedure to be followed if all parameters have been configured correctly, but the cores have not been created.
Reboot the server.
Verify that you have folder permissions:
sudo chmod -R 777 /tmp/core_dumps
Verify that the limits have been set correctly:
ulimit -c
If all parameters have been configured correctly, the correct output is:
core file size (blocks, -c) unlimited
If all parameters have been configured correctly, but running
ulimit -c
outputs0
, run the following:sudo vim /etc/profile
Search for the following line and disable it using the
#
symbol:ulimit -S -c 0 > /dev/null 2>&1
Log out and log back in.
Run the
ulimit -c
command:ulimit -a
If the line is not found in
/etc/profile
, do the following:Run the following command:
sudo vim /etc/init.d/functions
Search for the following line disable it using the
#
symbol and reboot the server.ulimit -S -c ${DAEMON_COREFILE_LIMIT:-0} >/dev/null 2>&1