Pre-Installation Configuration
Before installing SQreamDB, it is essential that you tune your system for better performance and stability.
Basic Input/Output System Settings
The first step when setting your pre-installation configurations is to use the basic input/output system (BIOS) settings.
The BIOS settings may have a variety of names, or may not exist on your system. Each system vendor has a different set of settings and variables. It is safe to skip any and all of the configuration steps, but this may impact performance.
If any doubt arises, consult the documentation for your server or your hardware vendor for the correct way to apply the settings.
Item |
Setting |
Rationale |
|---|---|---|
Management console access |
Connected |
Connection to Out-of-band (OOB) required to preserve continuous network uptime. |
All drives |
Connected and displayed on RAID interface |
Prerequisite for cluster or OS installation. |
RAID volumes |
Configured according to project guidelines. Must be rebooted to take effect. |
Clustered to increase logical volume and provide redundancy. |
Fan speed Thermal Configuration. |
Dell fan speed: High Maximum. Specified minimum setting: 60. HPe thermal configuration: Increased cooling. |
NVIDIA Tesla GPUs are passively cooled and require high airflow to operate at full performance. |
Power regulator or iDRAC power unit policy |
HPe: HP static high performance mode enabled. Dell: iDRAC power unit policy (power cap policy) disabled. |
Other power profiles (such as “balanced”) throttle the CPU and diminishes performance. Throttling may also cause GPU failure. |
System Profile, Power Profile, or Performance Profile |
High Performance |
The Performance profile provides potentially increased performance by maximizing processor frequency, and the disabling certain power saving features such as C-states. Use this setting for environments that are not sensitive to power consumption. |
Power Cap Policy or Dynamic power capping |
Disabled |
Other power profiles (like “balanced”) throttle the CPU and may diminish performance or cause GPU failure. This setting may appear together with the above (Power profile or Power regulator). This setting allows disabling system ROM power calibration during the boot process. Power regulator settings are named differently in BIOS and iLO/iDRAC. |
Intel Turbo Boost |
Enabled |
Intel Turbo Boost enables overclocking the processor to boost CPU-bound operation performance. Overclocking may risk computational jitter due to changes in the processor’s turbo frequency. This causes brief pauses in processor operation, introducing uncertainty into application processing time. Turbo operation is a function of power consumption, processor temperature, and the number of active cores. |
Intel Virtualization Technology (VT-d) |
Disable |
VT-d is optimal for running VMs. However, when running Linux natively, disabling VT-d boosts performance by up to 10%. |
Logical Processor |
HPe: Enable Hyperthreading Dell: Enable Logical Processor |
Hyperthreading doubles the amount of logical processors, which may improve performance by ~5-10% for CPU-bound operations. |
Intel Virtualization Technology (VT-d) |
Disable |
VT-d is optimal for running VMs. However, when running Linux natively, disabling VT-d boosts performance by up to 10%. |
Processor C-States (Minimum processor idle power core state) |
Disable |
Processor C-States reduce server power when the system is in an idle state. This causes slower cold-starts when the system transitions from an idle to a load state, and may reduce query performance by up to 15%. |
HPe: Energy/Performance bias |
Maximum performance |
Configures processor sub-systems for high-performance and low-latency. Other power profiles (like “balanced”) throttle the CPU and may diminish performance. Use this setting for environments that are not sensitive to power consumption. |
HPe: DIMM voltage |
Optimized for Performance |
Setting a higher voltage for DIMMs may increase performance. |
Memory Operating Mode |
Optimizer Mode, Disable Node Interleaving, Auto Memory Operating Voltage |
Memory Operating Mode is tuned for performance in Optimizer mode. Other modes may improve reliability, but reduce performance. Node Interleaving should be disabled because enabling it interleaves the memory between memory nodes, which harms NUMA-aware applications such as SQreamDB. |
HPe: Memory power savings mode |
Maximum performance |
This setting configures several memory parameters to optimize the performance of memory sub-systems. The default setting is Balanced. |
HPe ACPI SLIT |
Enabled |
ACPI SLIT sets the relative access times between processors and memory and I/O sub-systems. ACPI SLIT enables operating systems to use this data to improve performance by more efficiently allocating resources and workloads. |
QPI Snoop |
Cluster on Die or Home Snoop |
QPI (QuickPath Interconnect) Snoop lets you configure different Snoop modes that impact the QPI interconnect. Changing this setting may improve the performance of certain workloads. The default setting of Home Snoop provides high memory bandwidth in an average NUMA environment. Cluster on Die may provide increased memory bandwidth in highly optimized NUMA workloads. Early Snoop may decrease memory latency, but may result in lower overall bandwidth compared to other modes. |
Installing the Operating System
Before You Begin
Your system must have at least 200 gigabytes of free space on the root
/mount.For a multi-node cluster, you must have external shared storage provided by systems like General Parallel File System (GPFS), Weka, or VAST.
Once the BIOS settings have been set, you must install the operating system.
Make sure you use a supported OS version as listed on the release notes of the installed version.
Verify the exact RHEL8 version with your storage vendor to avoid driver incompatibility.
Installation
Select a language (English recommended).
From Software Selection, select Minimal and check the Development Tools group checkbox.
Selecting the Development Tools group installs the following tools:
autoconf
automake
binutils
bison
flex
gcc
gcc-c++
gettext
libtool
make
patch
pkgconfig
redhat-rpm-config
rpm-build
rpm-sign
Continue the installation.
Set up the necessary drives and users as per the installation process.
The OS shell is booted up.
Configuring the Operating System
When configuring the operating system, several basic settings related to creating a new server are required. Configuring these as part of your basic set-up increases your server’s security and usability.
Creating a sqream User
The sqream user must have the same UID and GID across all servers in your cluster.
If the sqream user does not have the same UID and GID across all servers and there is no critical data stored under /home/sqream, it is recommended to delete the sqream user and sqream group from your servers. Subsequently, create new ones with the same ID, using the following command:
sudo userdel sqream sudo rm /var/spool/mail/sqream
Before adding a user with a specific UID and GID, it is crucial to verify that such Ids do not already exist.
The steps below guide you on creating a sqream user with an exemplary ID of 1111.
Verify that a
1111UID does not already exists:cat /etc/passwd |grep 1111Verify that a
1111GID does not already exists:cat /etc/group |grep 1111Add a user with an identical UID on all cluster nodes:
useradd -u 1111 sqreamAdd a
sqreamuser to thewheelgroup.sudo usermod -aG wheel sqreamYou can remove the
sqreamuser from thewheelgroup when the installation and configuration are complete:passwd sqreamLog out and log back in as
sqream.If you deleted the
sqreamuser and recreated it to have a new ID, you must change its ownership to/home/sqreamin order to avoid permission errors.sudo chown -R sqream:sqream /home/sqream
Setting Up A Locale
SQreamDB enables you to set up a locale using your own location. To find out your current time-zone, run the timedatectl list-timezones command.
Set the language of the locale:
sudo localectl set-locale LANG=en_US.UTF-8
Installing Required Software
Installing EPEL Repository
sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
Enabling Additional Red Hat Repositories
Enabling additional Red Hat repositories is essential to install the required packages in the subsequent procedures.
sudo subscription-manager release --set=8.9 sudo subscription-manager repos --enable codeready-builder-for-rhel-8-x86_64-rpms sudo subscription-manager repos --enable rhel-8-for-x86_64-appstream-rpms sudo subscription-manager repos --enable rhel-8-for-x86_64-baseos-rpms
Installing Required Packages
sudo dnf install chrony pciutils monit zlib-devel openssl-devel kernel-devel-$(uname -r) kernel-headers-$(uname -r) gcc net-tools wget jq libffi-devel xz-devel ncurses-compat-libs libnsl gdbm-devel tk-devel sqlite-devel readline-devel texinfo
Installing Recommended Tools
sudo dnf install bash-completion.noarch vim-enhanced vim-common net-tools iotop htop psmisc screen xfsprogs wget yum-utils dos2unix
For SQreamDB version 4.4 or newer, install Python 3.9.13.
Download the Python 3.9.13 source code tarball file from the following URL into the
/home/sqreamdirectory:wget https://www.python.org/ftp/python/3.9.13/Python-3.9.13.tar.xzExtract the Python 3.9.13 source code into your current directory:
tar -xf Python-3.9.13.tar.xzNavigate to the Python 3.9.13 directory:
cd Python-3.9.13Run the
./configurescript:./configure --enable-loadable-sqlite-extensionsBuild the software:
make -j30Install the software:
sudo make installVerify that Python 3.9.13 has been installed:
python3 --version
Installing NodeJS
NodeJS is necessary only when the UI runs on the same server as SqreamDB. If not, you can skip this step.
Download the NodeJS source code tarball file from the following URL into the
/home/sqreamdirectory:wget https://nodejs.org/dist/v16.20.0/node-v16.20.0-linux-x64.tar.xz tar -xf node-v16.20.0-linux-x64.tar.xz
Move the node-v16.20.0-linux-x64 file to the /usr/local directory.
sudo mv node-v16.20.0-linux-x64 /usr/localNavigate to the
/usr/bin/directory:cd /usr/binCreate a symbolic link to the
/local/node-v16.20.0-linux-x64/bin/node nodedirectory:sudo ln -s ../local/node-v16.20.0-linux-x64//bin/node nodeCreate a symbolic link to the
/local/node-v16.20.0-linux-x64/bin/npm npmdirectory:sudo ln -s ../local/node-v16.20.0-linux-x64/bin/npm npmCreate a symbolic link to the
/local/node-v16.20.0-linux-x64/bin/npx npxdirectory:sudo ln -s ../local/node-v16.20.0-linux-x64/bin/npx npxInstall the
pm2process management:sudo npm install pm2 -g cd /usr/bin sudo ln -s ../local/node-v16.20.0-linux-x64/bin/pm2 pm2
If installing the
pm2process management fails, install it offline:
On a machine with internet access, install the following:
nodejs
npm
pm2
Extract the pm2 module to the correct directory:
cd /usr/local/node-v16.20.0-linux-x64/lib/node_modules tar -czvf pm2_x86.tar.gz pm2Copy the
pm2_x86.tar.gzfile to a server without access to the internet and extract it.
Move the
pm2folder to the/usr/local/node-v16.20.0-linux-x64/lib/node_modulesdirectory:sudo mv pm2 /usr/local/node-v16.20.0-linux-x64/lib/node_modulesNavigate back to the
/usr/bindirectory:cd /usr/binCreate a symbolink to the
pm2service:sudo ln -s /usr/local/node-v16.20.0-linux-x64/lib/node_modules/pm2/bin/pm2 pm2Verify that installation was successful without using
sudo:pm2 listVerify that the node versions for the above are correct:
node --version
Configuring Chrony for RHEL8 Only
Start the Chrony service:
sudo systemctl start chronydEnable the Chrony service to start automatically at boot time:
sudo systemctl enable chronyd
Check the status of the Chrony service:
sudo systemctl status chronyd
Configuring the Server to Boot Without Linux GUI
We recommend that you configure your server to boot without a Linux GUI by running the following command:
sudo systemctl set-default multi-user.target
Running this command activates the NO-UI server mode.
Configuring the Security Limits
The security limits refer to the number of open files, processes, etc.
sudo bashecho -e "sqream soft nproc 1000000\nsqream hard nproc 1000000\nsqream soft nofile 1000000\nsqream hard nofile 1000000\nroot soft nproc 1000000\nroot hard nproc 1000000\nroot soft nofile 1000000\nroot hard nofile 1000000\nsqream soft core unlimited\nsqream hard core unlimited" >> /etc/security/limits.conf
Configuring the Kernel Parameters
Insert a new line after each kernel parameter:
echo -e "vm.dirty_background_ratio = 5 \n vm.dirty_ratio = 10 \n vm.swappiness = 10 \n vm.vfs_cache_pressure = 200 \n vm.zone_reclaim_mode = 0 \n" >> /etc/sysctl.confCheck the maximum value of the
fs.file:sysctl -n fs.file-max
Configuring the Firewall
The example in this section shows the open ports for four sqreamd sessions. If more than four are required, open the required ports as needed. Port 8080 in the example below is a new UI port.
The ports listed below are required, and the same logic applies to all additional SQreamDB Worker ports.
Port |
Use |
|---|---|
8080 |
UI port |
443 |
UI over HTTPS ( requires nginx installation ) |
3105 |
SqreamDB metadataserver service |
3108 |
SqreamDB serverpicker service |
3109 |
SqreamDB serverpicker service over ssl |
5000 |
SqreamDB first worker default port |
5100 |
SqreamDB first worker over ssl default port |
5001 |
SqreamDB second worker default port |
5101 |
SqreamDB second worker over ssl default port |
Start the service and enable FirewallID on boot:
systemctl start firewalldAdd the following ports to the permanent firewall:
firewall-cmd --zone=public --permanent --add-port=8080/tcp firewall-cmd --zone=public --permanent --add-port=3105/tcp firewall-cmd --zone=public --permanent --add-port=3108/tcp firewall-cmd --zone=public --permanent --add-port=5000-5003/tcp firewall-cmd --zone=public --permanent --add-port=5100-5103/tcp firewall-cmd --permanent --list-all
Reload the firewall:
firewall-cmd --reloadEnable FirewallID on boot:
systemctl enable firewalldIf you do not need the firewall, you can disable it:
sudo systemctl stop firewalld sudo systemctl disable firewalld
Disabling SELinux
Disabling SELinux is a recommended action.
Show the status of
selinux:sudo sestatusIf the output is not
disabled, edit the/etc/selinux/configfile:sudo vim /etc/selinux/configChange
SELINUX=enforcingtoSELINUX=disabled:The above changes will only take effect after rebooting the server.
You can disable selinux immediately after rebooting the server by running the following command:
sudo setenforce 0
Configuring the /etc/hosts File
Edit the
/etc/hostsfile:sudo vim /etc/hostsCall your local host:
127.0.0.1 localhost <server1 ip> <server_name> <server2 ip> <server_name>
Installing the NVIDIA CUDA Driver
After configuring your operating system, you must install the NVIDIA CUDA driver.
Warning
If your Linux GUI runs on the server, it must be stopped before installing the CUDA drivers.
Before You Begin
Verify that the NVIDIA card has been installed and is detected by the system:
lspci | grep -i nvidiaVerify that
gcchas been installed:gcc --versionIf
gcchas not been installed, install it for RHEL:sudo yum install -y gcc
Updating the Kernel Headers
Update the kernel headers on RHEL:
sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)Make sure kernel-devel and kernel-headers match installed kernel:
uname -r rpm -qa |grep kernel-devel-$(uname -r) rpm -qa |grep kernel-headers-$(uname -r)
Disabling Nouveau
Disable Nouveau, which is the default operating system driver.
Check if the Nouveau driver has been loaded:
lsmod | grep nouveauIf the Nouveau driver has been loaded, the command above generates output. If the Nouveau driver has not been loaded, you may skip step 2 and 3.
Blacklist the Nouveau driver to disable it:
cat <<EOF | sudo tee /etc/modprobe.d/blacklist-nouveau.conf blacklist nouveau options nouveau modeset=0 EOF
Regenerate the kernel
initramfsdirectory set:
Modify the
initramfsdirectory set:sudo dracut --forceReboot the server:
sudo reboot
Installing the CUDA Driver
The current recommendation is for CUDA 12.3.2.
For questions related to which driver to install, contact SqreamDB support.
Installing the CUDA Driver from the Repository
Installing the CUDA driver from the Repository is the recommended installation method.
Install the CUDA dependencies for one of the following operating systems:
sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm(Optional) Install the CUDA dependencies from the
epelrepository:sudo yum install dkms libvdpauInstalling the CUDA depedendencies from the
epelrepository is only required for installingrunfile.Download and install the required local repository:
RHEL8.8/8.9 CUDA 12.3.2 repository ( INTEL ) installation ( Required for H/L Series GPU models ):
wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda-repo-rhel8-12-3-local-12.3.2_545.23.08-1.x86_64.rpm sudo dnf localinstall cuda-repo-rhel8-12-3-local-12.3.2_545.23.08-1.x86_64.rpm
sudo dnf clean all sudo dnf -y module install nvidia-driver:latest-dkms
Tuning Up NVIDIA Performance
The following procedures exclusively relate to Intel.
Tune Up NVIDIA Performance when Driver Installed from the Repository
Check the service status:
sudo systemctl status nvidia-persistencedIf the service exists, it will be stopped by default.
Start the service:
sudo systemctl start nvidia-persistencedVerify that no errors have occurred:
sudo systemctl status nvidia-persistencedEnable the service to start up on boot:
sudo systemctl enable nvidia-persistencedFor H100/A100, add the following lines:
nvidia-persistencedReboot the server and run the NVIDIA System Management Interface (NVIDIA SMI):
nvidia-smi
Tune Up NVIDIA Performance when Driver Installed from the Runfile
Change the permissions on the
rc.localfile toexecutable:sudo chmod +x /etc/rc.localEdit the
/etc/yum.repos.d/cuda-10-1-local.repofile:sudo vim /etc/rc.localAdd the following lines:
For H100/A100:
nvidia-persistenced
Reboot the server and run the
NVIDIA System Management Interface (NVIDIA SMI):nvidia-smi
Enabling Core Dumps
While this procedure is optional, SQreamDB recommends that core dumps be enabled. Note that the default abrt format is not gdb compatible, and that for SQreamDB support to be able to analyze your core dumps, they must be gdb compatible.
Checking the abrtd Status
Check if
abrtdis running:sudo ps -ef |grep abrtIf abrtd is running, stop it:
for i in abrt-ccpp.service abrtd.service abrt-oops.service abrt-pstoreoops.service abrt-vmcore.service abrt-xorg.service ; do sudo systemctl disable $i; sudo systemctl stop $i; done
Setting the Limits
Set the limits:
ulimit -cIf the output is
0, add the following lines to the/etc/security/limits.conffile:* soft core unlimited * hard core unlimited
To apply the limit changes, log out and log back in.
Creating the Core Dump Directory
Because the core dump file may be the size of total RAM on the server, verify that you have sufficient disk space. In the example above, the core dump is configured to the /tmp/core_dumps directory. If necessary, replace path according to your own environment and disk space.
Make the
/tmp/core_dumpsdirectory:mkdir /tmp/core_dumpsSet the ownership of the
/tmp/core_dumpsdirectory:sudo chown sqream.sqream /tmp/core_dumpsGrant read, write, and execute permissions to all users:
sudo chmod -R 777 /tmp/core_dumps
Setting the Output Directory on the /etc/sysctl.conf File
Open the
/etc/sysctl.conffile in the Vim text editor:sudo vim /etc/sysctl.confAdd the following to the bottom of the file:
kernel.core_uses_pid = 1 kernel.core_pattern = /tmp/core_dumps/core-%e-%s-%u-%g-%p-%t fs.suid_dumpable = 2
To apply the changes without rebooting the server, run the following:
sudo sysctl -p
Check that the core output directory points to the following:
sudo cat /proc/sys/kernel/core_patternThe following shows the correct generated output:
/tmp/core_dumps/core-%e-%s-%u-%g-%p-%t
Verifying that the Core Dumps Work
You can verify that the core dumps work only after installing and running SQreamDB. This causes the server to crash and a new core.xxx file to be included in the folder that is written in /etc/sysctl.conf.
Stop and restart all SQreamDB services.
Connect to SQreamDB with ClientCmd and run the following command:
select abort_server();
Verify Your SQreamDB Installation
Verify that the
sqreamuser exists and has the same ID on all cluster servers.
id sqream
please verify that the storage is mounted on all cluster servers.
mountmake sure that the driver is properly installed.
nvidia-smiVerify that the kernel file-handles allocation is greater than or equal to
2097152:sysctl -n fs.file-maxVerify limits (run this command as a
sqreamuser):ulimit -c -u -n Desired output: core file size (blocks, -c) unlimited max user processes (-u) 1000000 open files (-n) 1000000
Troubleshooting Core Dumping
This section describes the troubleshooting procedure to be followed if all parameters have been configured correctly, but the cores have not been created.
Reboot the server.
Verify that you have folder permissions:
sudo chmod -R 777 /tmp/core_dumpsVerify that the limits have been set correctly:
ulimit -cIf all parameters have been configured correctly, the correct output is:
core file size (blocks, -c) unlimitedIf all parameters have been configured correctly, but running
ulimit -coutputs0, run the following:sudo vim /etc/profileSearch for the following line and disable it using the
#symbol:ulimit -S -c 0 > /dev/null 2>&1Log out and log back in.
Run the
ulimit -ccommand:ulimit -aIf the line is not found in
/etc/profile, do the following:Run the following command:
sudo vim /etc/init.d/functionsSearch for the following line disable it using the
#symbol and reboot the server.ulimit -S -c ${DAEMON_COREFILE_LIMIT:-0} >/dev/null 2>&1