Using SQream in an HDFS Environment

Configuring an HDFS Environment for the User sqream

This section describes how to configure an HDFS environment for the user sqream and is only relevant for users with an HDFS environment.

To configure an HDFS environment for the user sqream:

  1. Open your bash_profile configuration file for editing:

    $ vim /home/sqream/.bash_profile
    
  1. Verify that the edits have been made:

    source /home/sqream/.bash_profile
    
  2. Check if you can access Hadoop from your machine:

$ hadoop fs -ls hdfs://<hadoop server name or ip>:8020/
  1. Verify that an HDFS environment exists for SQream services:

    $ ls -l /etc/sqream/sqream_env.sh
    
  1. If an HDFS environment does not exist for SQream services, create one (sqream_env.sh):

    $ #!/bin/bash
    
    $ SQREAM_HOME=/usr/local/sqream
    $ export SQREAM_HOME
    
    $ export JAVA_HOME=${SQREAM_HOME}/hdfs/jdk
    $ export HADOOP_INSTALL=${SQREAM_HOME}/hdfs/hadoop
    $ export CLASSPATH=`${HADOOP_INSTALL}/bin/hadoop classpath --glob`
    $ export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_INSTALL}/lib/native
    $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SQREAM_HOME}/lib:$HADOOP_COMMON_LIB_NATIVE_DIR
    
    
    $ PATH=$PATH:$HOME/.local/bin:$HOME/bin:${SQREAM_HOME}/bin/:${JAVA_HOME}/bin:$HADOOP_INSTALL/bin
    $ export PATH
    

Back to top

Authenticating Hadoop Servers that Require Kerberos

If your Hadoop server requires Kerberos authentication, do the following:

  1. Create a principal for the user sqream.

    $ kadmin -p root/[email protected]
    $ addprinc [email protected]
    
  2. If you do not know yor Kerberos root credentials, connect to the Kerberos server as a root user with ssh and run kadmin.local:

    $ kadmin.local
    

    Running kadmin.local does not require a password.

  3. If a password is not required, change your password to sqream@SQ.COM.

    $ change_password [email protected]
    
  4. Connect to the hadoop name node using ssh:

    $ cd /var/run/cloudera-scm-agent/process
    
  5. Check the most recently modified content of the directory above:

    $ ls -lrt
    
  6. Look for a recently updated folder containing the text hdfs.

    The following is an example of the correct folder name:

    cd <number>-hdfs-<something>
    

    This folder should contain a file named hdfs.keytab or another similar .keytab file.

  1. Copy the .keytab file to user sqream’s Home directory on the remote machines that you are planning to use Hadoop on.

  2. Copy the following files to the sqream sqream@server:<sqream folder>/hdfs/hadoop/etc/hadoop: directory:

    • core-site.xml

    • hdfs-site.xml

  3. Connect to the sqream server and verify that the .keytab file’s owner is a user sqream and is granted the correct permissions:

    $ sudo chown sqream:sqream /home/sqream/hdfs.keytab
    $ sudo chmod 600 /home/sqream/hdfs.keytab
    
  4. Log into the sqream server.

  5. Log in as the user sqream.

  6. Navigate to the Home directory and check the name of a Kerberos principal represented by the following .keytab file:

$ klist -kt hdfs.keytab

The following is an example of the correct output:

$ sqream@Host-121 ~ $ klist -kt hdfs.keytab
$ Keytab name: FILE:hdfs.keytab
$ KVNO Timestamp           Principal
$ ---- ------------------- ------------------------------------------------------
$    5 09/15/2020 18:03:05 HTTP/[email protected]
$    5 09/15/2020 18:03:05 HTTP/[email protected]
$    5 09/15/2020 18:03:05 HTTP/[email protected]
$    5 09/15/2020 18:03:05 HTTP/[email protected]
$    5 09/15/2020 18:03:05 HTTP/[email protected]
$    5 09/15/2020 18:03:05 HTTP/[email protected]
$    5 09/15/2020 18:03:05 HTTP/[email protected]
$    5 09/15/2020 18:03:05 HTTP/[email protected]
$    5 09/15/2020 18:03:05 hdfs/[email protected]
$    5 09/15/2020 18:03:05 hdfs/[email protected]
$    5 09/15/2020 18:03:05 hdfs/[email protected]
$    5 09/15/2020 18:03:05 hdfs/[email protected]
$    5 09/15/2020 18:03:05 hdfs/[email protected]
$    5 09/15/2020 18:03:05 hdfs/[email protected]
$    5 09/15/2020 18:03:05 hdfs/[email protected]
$    5 09/15/2020 18:03:05 hdfs/[email protected]
  1. Verify that the hdfs service named hdfs/nn1@SQ.COM is shown in the generated output above.

  2. Run the following:

$ kinit -kt hdfs.keytab hdfs/[email protected]
  1. Verify that the output is correct:

$ klist

The following is an example of the correct output:

$ Ticket cache: FILE:/tmp/krb5cc_1000
$ Default principal: [email protected]
$
$ Valid starting       Expires              Service principal
$ 09/16/2020 13:44:18  09/17/2020 13:44:18  krbtgt/[email protected]
  1. List the files located at the defined server name or IP address:

$ hadoop fs -ls hdfs://<hadoop server name or ip>:8020/
  1. Do one of the following:

    • If the list below is output, continue with Step 18.

    • If the list is not output, verify that your environment has been set up correctly.

If any of the following are empty, verify that you followed Step 6 in the Configuring an HDFS Environment for the User sqream section above correctly:

$ echo $JAVA_HOME
$ echo $SQREAM_HOME
$ echo $CLASSPATH
$ echo $HADOOP_COMMON_LIB_NATIVE_DIR
$ echo $LD_LIBRARY_PATH
$ echo $PATH
  1. Verify that you copied the correct keytab file.

  2. Review this procedure to verify that you have followed each step.

Back to top