HDFS Environment

Configuring an HDFS Environment for the User sqream

This section describes how to configure an HDFS environment for the user sqream and is only relevant for users with an HDFS environment.

To configure an HDFS environment for the user sqream:

  1. Open your bash_profile configuration file for editing:

    vim /home/sqream/.bash_profile
    
    #PATH=$PATH:$HOME/.local/bin:$HOME/bin
    
    #export PATH
    
    # PS1
    #MYIP=$(curl -s -XGET "http://ip-api.com/json" | python -c 'import json,sys; jstr=json.load(sys.stdin); print jstr["query"]')
    #PS1="\[\e[01;32m\]\D{%F %T} \[\e[01;33m\]\u@\[\e[01;36m\]$MYIP \[\e[01;31m\]\w\[\e[37;36m\]\$ \[\e[1;37m\]"
    
    SQREAM_HOME=/usr/local/sqream
    export SQREAM_HOME
    
    export JAVA_HOME=${SQREAM_HOME}/hdfs/jdk
    export HADOOP_INSTALL=${SQREAM_HOME}/hdfs/hadoop
    export CLASSPATH=`${HADOOP_INSTALL}/bin/hadoop classpath --glob`
    export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_INSTALL}/lib/native
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SQREAM_HOME}/lib:$HADOOP_COMMON_LIB_NATIVE_DIR
    
    
    PATH=$PATH:$HOME/.local/bin:$HOME/bin:${SQREAM_HOME}/bin/:${JAVA_HOME}/bin:$HADOOP_INSTALL/bin
    export PATH
    
  2. Verify that the edits have been made:

    source /home/sqream/.bash_profile
    
  3. Check if you can access Hadoop from your machine:

hadoop fs -ls hdfs://<hadoop server name or ip>:8020/
  1. Verify that an HDFS environment exists for SQream services:

    $ ls -l /etc/sqream/sqream_env.sh
    
  1. If an HDFS environment does not exist for SQream services, create one (sqream_env.sh):

    #!/bin/bash
    
    SQREAM_HOME=/usr/local/sqream
    export SQREAM_HOME
    
    export JAVA_HOME=${SQREAM_HOME}/hdfs/jdk
    export HADOOP_INSTALL=${SQREAM_HOME}/hdfs/hadoop
    export CLASSPATH=`${HADOOP_INSTALL}/bin/hadoop classpath --glob`
    export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_INSTALL}/lib/native
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SQREAM_HOME}/lib:$HADOOP_COMMON_LIB_NATIVE_DIR
    
    
    PATH=$PATH:$HOME/.local/bin:$HOME/bin:${SQREAM_HOME}/bin/:${JAVA_HOME}/bin:$HADOOP_INSTALL/bin
    export PATH
    

Authenticating Hadoop Servers that Require Kerberos

If your Hadoop server requires Kerberos authentication, do the following:

  1. Create a principal for the user sqream.

    kadmin -p root/[email protected]
    addprinc [email protected]
    
  2. If you do not know yor Kerberos root credentials, connect to the Kerberos server as a root user with ssh and run:

    kadmin.local
    

    Running kadmin.local does not require a password.

  3. If a password is not required, change your password to sqream@SQ.COM.

    change_password [email protected]
    
  4. Connect to the hadoop name node using ssh:

    cd /var/run/cloudera-scm-agent/process
    
  5. Check the most recently modified content of the directory above:

    ls -lrt
    
  6. Look for a recently updated folder containing the text hdfs.

    The following is an example of the correct folder name:

    cd <number>-hdfs-<something>
    

    This folder should contain a file named hdfs.keytab or a similar .keytab file.

  7. Copy the .keytab file to user sqream’s Home directory on the remote machines that you are planning to use Hadoop on.

  8. Copy the following files to the sqream sqream@server:<sqream folder>/hdfs/hadoop/etc/hadoop: directory:

    • core-site.xml

    • hdfs-site.xml

  9. Connect to the sqream server and verify that the .keytab file’s owner is a user sqream and is granted the correct permissions:

    sudo chown sqream:sqream /home/sqream/hdfs.keytab
    sudo chmod 600 /home/sqream/hdfs.keytab
    
  10. Log into the sqream server.

  11. Log in as the user sqream.

  12. Navigate to the Home directory and check the name of a Kerberos principal represented by the following .keytab file:

klist -kt hdfs.keytab

The following is an example of the correct output:

sqream@Host-121 ~ $ klist -kt hdfs.keytab
Keytab name: FILE:hdfs.keytab
KVNO Timestamp           Principal
---- ------------------- ------------------------------------------------------
   5 09/15/2020 18:03:05 HTTP/[email protected]
   5 09/15/2020 18:03:05 HTTP/[email protected]
   5 09/15/2020 18:03:05 HTTP/[email protected]
   5 09/15/2020 18:03:05 HTTP/[email protected]
   5 09/15/2020 18:03:05 HTTP/[email protected]
   5 09/15/2020 18:03:05 HTTP/[email protected]
   5 09/15/2020 18:03:05 HTTP/[email protected]
   5 09/15/2020 18:03:05 HTTP/[email protected]
   5 09/15/2020 18:03:05 hdfs/[email protected]
   5 09/15/2020 18:03:05 hdfs/[email protected]
   5 09/15/2020 18:03:05 hdfs/[email protected]
   5 09/15/2020 18:03:05 hdfs/[email protected]
   5 09/15/2020 18:03:05 hdfs/[email protected]
   5 09/15/2020 18:03:05 hdfs/[email protected]
   5 09/15/2020 18:03:05 hdfs/[email protected]
   5 09/15/2020 18:03:05 hdfs/[email protected]
  1. Verify that the hdfs service named hdfs/nn1@SQ.COM is shown in the generated output above.

  2. Run the following:

kinit -kt hdfs.keytab hdfs/[email protected]
  1. Check the output:

klist

The following is an example of the correct output:

Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: [email protected]

Valid starting       Expires              Service principal
09/16/2020 13:44:18  09/17/2020 13:44:18  krbtgt/[email protected]
  1. List the files located at the defined server name or IP address:

hadoop fs -ls hdfs://<hadoop server name or ip>:8020/
  1. Do one of the following:

    • If the list below is output, continue with the next step.

    • If the list is not output, verify that your environment has been set up correctly.

If any of the following are empty, verify that you followed Step 6 in the Configuring an HDFS Environment for the User sqream section above correctly:

echo $JAVA_HOME
echo $SQREAM_HOME
echo $CLASSPATH
echo $HADOOP_COMMON_LIB_NATIVE_DIR
echo $LD_LIBRARY_PATH
echo $PATH
  1. Verify that you copied the correct keytab file.

  2. Review this procedure to verify that you have followed each step.