Pre-Installation Configuration

Before installing SQreamDB, it is essential that you tune your system for better performance and stability.

Basic Input/Output System Settings 

The first step when setting your pre-installation configurations is to use the basic input/output system (BIOS) settings.

The BIOS settings may have a variety of names, or may not exist on your system. Each system vendor has a different set of settings and variables. It is safe to skip any and all of the configuration steps, but this may impact performance.

If any doubt arises, consult the documentation for your server or your hardware vendor for the correct way to apply the settings.

Item	Setting	Rationale
Management console access	Connected	Connection to Out-of-band (OOB) required to preserve continuous network uptime.
All drives	Connected and displayed on RAID interface	Prerequisite for cluster or OS installation.
RAID volumes	Configured according to project guidelines. Must be rebooted to take effect.	Clustered to increase logical volume and provide redundancy.
Fan speed Thermal Configuration.	Dell fan speed: High Maximum. Specified minimum setting: 60. HPe thermal configuration: Increased cooling.	NVIDIA Tesla GPUs are passively cooled and require high airflow to operate at full performance.
Power regulator or iDRAC power unit policy	HPe: HP static high performance mode enabled. Dell: iDRAC power unit policy (power cap policy) disabled.	Other power profiles (such as “balanced”) throttle the CPU and diminishes performance. Throttling may also cause GPU failure.
System Profile, Power Profile, or Performance Profile	High Performance	The Performance profile provides potentially increased performance by maximizing processor frequency, and the disabling certain power saving features such as C-states. Use this setting for environments that are not sensitive to power consumption.
Power Cap Policy or Dynamic power capping	Disabled	Other power profiles (like “balanced”) throttle the CPU and may diminish performance or cause GPU failure. This setting may appear together with the above (Power profile or Power regulator). This setting allows disabling system ROM power calibration during the boot process. Power regulator settings are named differently in BIOS and iLO/iDRAC.
Intel Turbo Boost	Enabled	Intel Turbo Boost enables overclocking the processor to boost CPU-bound operation performance. Overclocking may risk computational jitter due to changes in the processor’s turbo frequency. This causes brief pauses in processor operation, introducing uncertainty into application processing time. Turbo operation is a function of power consumption, processor temperature, and the number of active cores.
Intel Virtualization Technology (VT-d)	Disable	VT-d is optimal for running VMs. However, when running Linux natively, disabling VT-d boosts performance by up to 10%.
Logical Processor	HPe: Enable Hyperthreading Dell: Enable Logical Processor	Hyperthreading doubles the amount of logical processors, which may improve performance by ~5-10% for CPU-bound operations.
Intel Virtualization Technology (VT-d)	Disable	VT-d is optimal for running VMs. However, when running Linux natively, disabling VT-d boosts performance by up to 10%.
Processor C-States (Minimum processor idle power core state)	Disable	Processor C-States reduce server power when the system is in an idle state. This causes slower cold-starts when the system transitions from an idle to a load state, and may reduce query performance by up to 15%.
HPe: Energy/Performance bias	Maximum performance	Configures processor sub-systems for high-performance and low-latency. Other power profiles (like “balanced”) throttle the CPU and may diminish performance. Use this setting for environments that are not sensitive to power consumption.
HPe: DIMM voltage	Optimized for Performance	Setting a higher voltage for DIMMs may increase performance.
Memory Operating Mode	Optimizer Mode, Disable Node Interleaving, Auto Memory Operating Voltage	Memory Operating Mode is tuned for performance in Optimizer mode. Other modes may improve reliability, but reduce performance. Node Interleaving should be disabled because enabling it interleaves the memory between memory nodes, which harms NUMA-aware applications such as SQreamDB.
HPe: Memory power savings mode	Maximum performance	This setting configures several memory parameters to optimize the performance of memory sub-systems. The default setting is Balanced.
HPe ACPI SLIT	Enabled	ACPI SLIT sets the relative access times between processors and memory and I/O sub-systems. ACPI SLIT enables operating systems to use this data to improve performance by more efficiently allocating resources and workloads.
QPI Snoop	Cluster on Die or Home Snoop	QPI (QuickPath Interconnect) Snoop lets you configure different Snoop modes that impact the QPI interconnect. Changing this setting may improve the performance of certain workloads. The default setting of Home Snoop provides high memory bandwidth in an average NUMA environment. Cluster on Die may provide increased memory bandwidth in highly optimized NUMA workloads. Early Snoop may decrease memory latency, but may result in lower overall bandwidth compared to other modes.

Installing the Operating System 

Before You Begin

Your system must have at least 200 gigabytes of free space on the root / mount.
For a multi-node cluster, you must have external shared storage provided by systems like General Parallel File System (GPFS), Weka, or VAST.
Once the BIOS settings have been set, you must install the operating system.
Make sure you use a supported OS version as listed on the release notes of the installed version.
Verify the exact RHEL8 version with your storage vendor to avoid driver incompatibility.

Installation

Select a language (English recommended).
From Software Selection, select Minimal and check the Development Tools group checkbox.

Selecting the Development Tools group installs the following tools:
- autoconf
- automake
- binutils
- bison
- flex
- gcc
- gcc-c++
- gettext
- libtool
- make
- patch
- pkgconfig
- redhat-rpm-config
- rpm-build
- rpm-sign
Continue the installation.
Set up the necessary drives and users as per the installation process.

The OS shell is booted up.

Configuring the Operating System 

When configuring the operating system, several basic settings related to creating a new server are required. Configuring these as part of your basic set-up increases your server’s security and usability.

Creating a `sqream` User

The sqream user must have the same UID and GID across all servers in your cluster.

If the sqream user does not have the same UID and GID across all servers and there is no critical data stored under /home/sqream, it is recommended to delete the sqream user and sqream group from your servers. Subsequently, create new ones with the same ID, using the following command:

sudo userdel sqream
sudo rm /var/spool/mail/sqream

Before adding a user with a specific UID and GID, it is crucial to verify that such Ids do not already exist.

The steps below guide you on creating a sqream user with an exemplary ID of 1111.

Verify that a 1111 UID does not already exists:
```
cat /etc/passwd |grep 1111
```
Verify that a 1111 GID does not already exists:
```
cat /etc/group |grep 1111
```
Add a user with an identical UID on all cluster nodes:
```
useradd -u 1111 sqream
```
Add a sqream user to the wheel group.
```
sudo usermod -aG wheel sqream
```
You can remove the sqream user from the wheel group when the installation and configuration are complete:
```
passwd sqream
```
Log out and log back in as sqream.
If you deleted the sqream user and recreated it to have a new ID, you must change its ownership to /home/sqream in order to avoid permission errors.
```
sudo chown -R sqream:sqream /home/sqream
```

Setting Up A Locale

SQreamDB enables you to set up a locale using your own location. To find out your current time-zone, run the timedatectl list-timezones command.

Set the language of the locale:

sudo localectl set-locale LANG=en_US.UTF-8

Installing Required Software

Installing EPEL Repository 

sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm

Enabling Additional Red Hat Repositories

Enabling additional Red Hat repositories is essential to install the required packages in the subsequent procedures.

sudo subscription-manager release --set=8.9
sudo subscription-manager repos --enable codeready-builder-for-rhel-8-x86_64-rpms
sudo subscription-manager repos --enable rhel-8-for-x86_64-appstream-rpms
sudo subscription-manager repos --enable rhel-8-for-x86_64-baseos-rpms

Installing Required Packages 

sudo dnf install chrony pciutils monit zlib-devel openssl-devel kernel-devel-$(uname -r) kernel-headers-$(uname -r) gcc net-tools wget jq libffi-devel xz-devel ncurses-compat-libs libnsl gdbm-devel tk-devel sqlite-devel readline-devel texinfo

Installing Recommended Tools 

sudo dnf install bash-completion.noarch vim-enhanced vim-common net-tools iotop htop psmisc screen xfsprogs wget yum-utils dos2unix

For SQreamDB version 4.4 or newer, install Python 3.9.13.

Download the Python 3.9.13 source code tarball file from the following URL into the /home/sqream directory:
```
wget https://www.python.org/ftp/python/3.9.13/Python-3.9.13.tar.xz
```
Extract the Python 3.9.13 source code into your current directory:
```
tar -xf Python-3.9.13.tar.xz
```
Navigate to the Python 3.9.13 directory:
```
cd Python-3.9.13
```

Run the ./configure script:

./configure --enable-loadable-sqlite-extensions

Build the software:
```
make -j30
```
Install the software:
```
sudo make install
```
Verify that Python 3.9.13 has been installed:
```
python3 --version
```

Installing NodeJS 

NodeJS is necessary only when the UI runs on the same server as SqreamDB. If not, you can skip this step.

Download the NodeJS source code tarball file from the following URL into the /home/sqream directory:

wget https://nodejs.org/dist/v16.20.0/node-v16.20.0-linux-x64.tar.xz
tar -xf node-v16.20.0-linux-x64.tar.xz

Move the node-v16.20.0-linux-x64 file to the /usr/local directory.
```
sudo mv  node-v16.20.0-linux-x64 /usr/local
```
Navigate to the /usr/bin/ directory:
```
cd /usr/bin
```
Create a symbolic link to the /local/node-v16.20.0-linux-x64/bin/node node directory:
```
sudo ln -s ../local/node-v16.20.0-linux-x64//bin/node node
```
Create a symbolic link to the /local/node-v16.20.0-linux-x64/bin/npm npm directory:
```
sudo ln -s ../local/node-v16.20.0-linux-x64/bin/npm npm
```
Create a symbolic link to the /local/node-v16.20.0-linux-x64/bin/npx npx directory:
```
sudo ln -s ../local/node-v16.20.0-linux-x64/bin/npx npx
```

Install the pm2 process management:

sudo npm install pm2 -g
cd /usr/bin
sudo ln -s ../local/node-v16.20.0-linux-x64/bin/pm2 pm2

If installing the pm2 process management fails, install it offline:

On a machine with internet access, install the following:

nodejs

npm

pm2
Extract the pm2 module to the correct directory:
cd /usr/local/node-v16.20.0-linux-x64/lib/node_modules
tar -czvf pm2_x86.tar.gz pm2
Copy the pm2_x86.tar.gz file to a server without access to the internet and extract it.
Move the pm2 folder to the /usr/local/node-v16.20.0-linux-x64/lib/node_modules directory:
sudo mv pm2 /usr/local/node-v16.20.0-linux-x64/lib/node_modules
Navigate back to the /usr/bin directory:
cd /usr/bin
Create a symbolink to the pm2 service:
sudo ln -s /usr/local/node-v16.20.0-linux-x64/lib/node_modules/pm2/bin/pm2 pm2
Verify that installation was successful without using sudo:
pm2 list
Verify that the node versions for the above are correct:
node --version

Configuring Chrony for RHEL8 Only

Start the Chrony service:
```
sudo systemctl start chronyd
```
Enable the Chrony service to start automatically at boot time:
```
sudo systemctl enable chronyd
```
Check the status of the Chrony service:
```
sudo systemctl status chronyd
```

Configuring the Server to Boot Without Linux GUI

We recommend that you configure your server to boot without a Linux GUI by running the following command:

sudo systemctl set-default multi-user.target

Running this command activates the NO-UI server mode.

Configuring the Security Limits

The security limits refer to the number of open files, processes, etc.

sudo bash

echo -e "sqream soft nproc 1000000\nsqream hard nproc 1000000\nsqream soft nofile 1000000\nsqream hard nofile 1000000\nroot soft nproc 1000000\nroot hard nproc 1000000\nroot soft nofile 1000000\nroot hard nofile 1000000\nsqream soft core unlimited\nsqream hard core unlimited" >> /etc/security/limits.conf

Configuring the Kernel Parameters

Insert a new line after each kernel parameter:

echo -e "vm.dirty_background_ratio = 5 \n vm.dirty_ratio = 10 \n vm.swappiness = 10 \n vm.vfs_cache_pressure = 200 \n vm.zone_reclaim_mode = 0 \n" >> /etc/sysctl.conf

Check the maximum value of the fs.file:
```
sysctl -n fs.file-max
```

Configuring the Firewall

The example in this section shows the open ports for four sqreamd sessions. If more than four are required, open the required ports as needed. Port 8080 in the example below is a new UI port.

The ports listed below are required, and the same logic applies to all additional SQreamDB Worker ports.

Port	Use
8080	UI port
443	UI over HTTPS ( requires nginx installation )
3105	SqreamDB metadataserver service
3108	SqreamDB serverpicker service
3109	SqreamDB serverpicker service over ssl
5000	SqreamDB first worker default port
5100	SqreamDB first worker over ssl default port
5001	SqreamDB second worker default port
5101	SqreamDB second worker over ssl default port

Start the service and enable FirewallID on boot:
```
systemctl start firewalld
```

Add the following ports to the permanent firewall:

firewall-cmd --zone=public --permanent --add-port=8080/tcp
firewall-cmd --zone=public --permanent --add-port=3105/tcp
firewall-cmd --zone=public --permanent --add-port=3108/tcp
firewall-cmd --zone=public --permanent --add-port=5000-5003/tcp
firewall-cmd --zone=public --permanent --add-port=5100-5103/tcp
firewall-cmd --permanent --list-all

Reload the firewall:
```
firewall-cmd --reload
```

Enable FirewallID on boot:

systemctl enable firewalld

If you do not need the firewall, you can disable it:

sudo systemctl stop firewalld
sudo systemctl disable firewalld

Disabling SELinux

Disabling SELinux is a recommended action.

Show the status of selinux:
```
sudo sestatus
```
If the output is not disabled, edit the /etc/selinux/config file:
```
sudo vim /etc/selinux/config
```
Change SELINUX=enforcing to SELINUX=disabled:

The above changes will only take effect after rebooting the server.

You can disable selinux immediately after rebooting the server by running the following command:
```
sudo setenforce 0
```

Configuring the `/etc/hosts` File

Edit the /etc/hosts file:
```
sudo vim /etc/hosts
```

Call your local host:

127.0.0.1 localhost
<server1 ip>      <server_name>
<server2 ip>      <server_name>

Installing the NVIDIA CUDA Driver 

After configuring your operating system, you must install the NVIDIA CUDA driver.

Warning

If your Linux GUI runs on the server, it must be stopped before installing the CUDA drivers.

Before You Begin

Verify that the NVIDIA card has been installed and is detected by the system:
```
lspci | grep -i nvidia
```
Verify that gcc has been installed:
```
gcc --version
```
If gcc has not been installed, install it for RHEL:
```
sudo yum install -y gcc
```

Updating the Kernel Headers

Update the kernel headers on RHEL:

sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

Make sure kernel-devel and kernel-headers match installed kernel:

uname -r
rpm -qa |grep kernel-devel-$(uname -r)
rpm -qa |grep kernel-headers-$(uname -r)

Disabling Nouveau

Disable Nouveau, which is the default operating system driver.

Check if the Nouveau driver has been loaded:
```
lsmod | grep nouveau
```
If the Nouveau driver has been loaded, the command above generates output. If the Nouveau driver has not been loaded, you may skip step 2 and 3.

Blacklist the Nouveau driver to disable it:

cat <<EOF | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
EOF

Regenerate the kernel initramfs directory set:

Modify the initramfs directory set:
sudo dracut --force
Reboot the server:
sudo reboot

Installing the CUDA Driver

The current recommendation is for CUDA 12.3.2.

For questions related to which driver to install, contact SqreamDB support.

Installing the CUDA Driver from the Repository 

Installing the CUDA driver from the Repository is the recommended installation method.

Install the CUDA dependencies for one of the following operating systems:

sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm

(Optional) Install the CUDA dependencies from the epel repository:
```
sudo yum install dkms libvdpau
```
Installing the CUDA depedendencies from the epel repository is only required for installing runfile.

Download and install the required local repository:

RHEL8.8/8.9 CUDA 12.3.2 repository ( INTEL ) installation ( Required for H/L Series GPU models ):

wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda-repo-rhel8-12-3-local-12.3.2_545.23.08-1.x86_64.rpm
sudo dnf localinstall cuda-repo-rhel8-12-3-local-12.3.2_545.23.08-1.x86_64.rpm

sudo dnf clean all
sudo dnf -y module install nvidia-driver:latest-dkms

Tuning Up NVIDIA Performance 

The following procedures exclusively relate to Intel.

Tune Up NVIDIA Performance when Driver Installed from the Repository 

Check the service status:
```
sudo systemctl status nvidia-persistenced
```
If the service exists, it will be stopped by default.

Start the service:

sudo systemctl start nvidia-persistenced

Verify that no errors have occurred:

sudo systemctl status nvidia-persistenced

Enable the service to start up on boot:

sudo systemctl enable nvidia-persistenced

For H100/A100, add the following lines:
```
nvidia-persistenced
```
Reboot the server and run the NVIDIA System Management Interface (NVIDIA SMI):
```
nvidia-smi
```

Tune Up NVIDIA Performance when Driver Installed from the Runfile 

Change the permissions on the rc.local file to executable:
sudo chmod +x /etc/rc.local
Edit the /etc/yum.repos.d/cuda-10-1-local.repo file:
sudo vim /etc/rc.local
Add the following lines:
- For H100/A100:
  nvidia-persistenced
Reboot the server and run the NVIDIA System Management Interface (NVIDIA SMI):
```
nvidia-smi
```

Enabling Core Dumps 

While this procedure is optional, SQreamDB recommends that core dumps be enabled. Note that the default abrt format is not gdb compatible, and that for SQreamDB support to be able to analyze your core dumps, they must be gdb compatible.

Checking the `abrtd` Status 

Check if abrtd is running:
```
sudo ps -ef |grep abrt
```

If abrtd is running, stop it:

for i in abrt-ccpp.service abrtd.service abrt-oops.service abrt-pstoreoops.service abrt-vmcore.service abrt-xorg.service ; do sudo systemctl disable $i; sudo systemctl stop $i; done

Setting the Limits 

Set the limits:
```
ulimit -c
```

If the output is 0, add the following lines to the /etc/security/limits.conf file:

*          soft     core           unlimited
*          hard     core           unlimited

To apply the limit changes, log out and log back in.

Creating the Core Dump Directory 

Because the core dump file may be the size of total RAM on the server, verify that you have sufficient disk space. In the example above, the core dump is configured to the /tmp/core_dumps directory. If necessary, replace path according to your own environment and disk space.

Make the /tmp/core_dumps directory:
```
mkdir /tmp/core_dumps
```
Set the ownership of the /tmp/core_dumps directory:
```
sudo chown sqream.sqream /tmp/core_dumps
```
Grant read, write, and execute permissions to all users:
```
sudo chmod -R 777 /tmp/core_dumps
```

Setting the Output Directory on the `/etc/sysctl.conf` File 

Open the /etc/sysctl.conf file in the Vim text editor:
```
sudo vim /etc/sysctl.conf
```

Add the following to the bottom of the file:

kernel.core_uses_pid = 1
kernel.core_pattern = /tmp/core_dumps/core-%e-%s-%u-%g-%p-%t
fs.suid_dumpable = 2

To apply the changes without rebooting the server, run the following:

sudo sysctl -p

Check that the core output directory points to the following:

sudo cat /proc/sys/kernel/core_pattern
The following shows the correct generated output:
/tmp/core_dumps/core-%e-%s-%u-%g-%p-%t

Verifying that the Core Dumps Work 

You can verify that the core dumps work only after installing and running SQreamDB. This causes the server to crash and a new core.xxx file to be included in the folder that is written in /etc/sysctl.conf.

Stop and restart all SQreamDB services.
Connect to SQreamDB with ClientCmd and run the following command:

select abort_server();

Verify Your SQreamDB Installation 

Verify that the sqream user exists and has the same ID on all cluster servers.

id sqream

please verify that the storage is mounted on all cluster servers.
```
mount
```
make sure that the driver is properly installed.
```
nvidia-smi
```
Verify that the kernel file-handles allocation is greater than or equal to 2097152:
```
sysctl -n fs.file-max
```

Verify limits (run this command as a sqream user):

ulimit -c -u -n

Desired output:
core file size (blocks, -c) unlimited
max user processes (-u) 1000000
open files (-n) 1000000

Troubleshooting Core Dumping 

This section describes the troubleshooting procedure to be followed if all parameters have been configured correctly, but the cores have not been created.

Reboot the server.
Verify that you have folder permissions:
```
sudo chmod -R 777 /tmp/core_dumps
```
Verify that the limits have been set correctly:
```
ulimit -c
```
If all parameters have been configured correctly, the correct output is:
```
core file size          (blocks, -c) unlimited
```
If all parameters have been configured correctly, but running ulimit -c outputs 0, run the following:
```
sudo vim /etc/profile
```
Search for the following line and disable it using the # symbol:
```
ulimit -S -c 0 > /dev/null 2>&1
```
Log out and log back in.
Run the ulimit -c command:
```
ulimit -a
```
If the line is not found in /etc/profile, do the following:
1. Run the following command:
```
sudo vim /etc/init.d/functions
```
2. Search for the following line disable it using the # symbol and reboot the server.
```
ulimit -S -c ${DAEMON_COREFILE_LIMIT:-0} >/dev/null 2>&1
```

Pre-Installation Configuration

Before You Begin

Installation

Creating a sqream User

Setting Up A Locale

Installing Required Software

Enabling Additional Red Hat Repositories

Configuring Chrony for RHEL8 Only

Configuring the Server to Boot Without Linux GUI

Configuring the Security Limits

Configuring the Kernel Parameters

Configuring the Firewall

Disabling SELinux

Configuring the /etc/hosts File

Before You Begin

Updating the Kernel Headers

Disabling Nouveau

Installing the CUDA Driver

Creating a `sqream` User

Configuring the `/etc/hosts` File