HDFS Environment
Configuring an HDFS Environment for the User sqream
This section describes how to configure an HDFS environment for the user sqream and is only relevant for users with an HDFS environment.
To configure an HDFS environment for the user sqream:
Open your bash_profile configuration file for editing:
vim /home/sqream/.bash_profile
#PATH=$PATH:$HOME/.local/bin:$HOME/bin #export PATH # PS1 #MYIP=$(curl -s -XGET "http://ip-api.com/json" | python -c 'import json,sys; jstr=json.load(sys.stdin); print jstr["query"]') #PS1="\[\e[01;32m\]\D{%F %T} \[\e[01;33m\]\u@\[\e[01;36m\]$MYIP \[\e[01;31m\]\w\[\e[37;36m\]\$ \[\e[1;37m\]" SQREAM_HOME=/usr/local/sqream export SQREAM_HOME export JAVA_HOME=${SQREAM_HOME}/hdfs/jdk export HADOOP_INSTALL=${SQREAM_HOME}/hdfs/hadoop export CLASSPATH=`${HADOOP_INSTALL}/bin/hadoop classpath --glob` export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_INSTALL}/lib/native export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SQREAM_HOME}/lib:$HADOOP_COMMON_LIB_NATIVE_DIR PATH=$PATH:$HOME/.local/bin:$HOME/bin:${SQREAM_HOME}/bin/:${JAVA_HOME}/bin:$HADOOP_INSTALL/bin export PATH
Verify that the edits have been made:
source /home/sqream/.bash_profile
Check if you can access Hadoop from your machine:
hadoop fs -ls hdfs://<hadoop server name or ip>:8020/
Verify that an HDFS environment exists for SQream services:
$ ls -l /etc/sqream/sqream_env.sh
If an HDFS environment does not exist for SQream services, create one (sqream_env.sh):
#!/bin/bash SQREAM_HOME=/usr/local/sqream export SQREAM_HOME export JAVA_HOME=${SQREAM_HOME}/hdfs/jdk export HADOOP_INSTALL=${SQREAM_HOME}/hdfs/hadoop export CLASSPATH=`${HADOOP_INSTALL}/bin/hadoop classpath --glob` export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_INSTALL}/lib/native export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SQREAM_HOME}/lib:$HADOOP_COMMON_LIB_NATIVE_DIR PATH=$PATH:$HOME/.local/bin:$HOME/bin:${SQREAM_HOME}/bin/:${JAVA_HOME}/bin:$HADOOP_INSTALL/bin export PATH
Authenticating Hadoop Servers that Require Kerberos
If your Hadoop server requires Kerberos authentication, do the following:
Create a principal for the user sqream.
kadmin -p root/[email protected] addprinc [email protected]
If you do not know yor Kerberos root credentials, connect to the Kerberos server as a root user with ssh and run:
kadmin.local
Running
kadmin.local
does not require a password.If a password is not required, change your password to
sqream@SQ.COM
.change_password [email protected]
Connect to the hadoop name node using ssh:
cd /var/run/cloudera-scm-agent/process
Check the most recently modified content of the directory above:
ls -lrt
Look for a recently updated folder containing the text hdfs.
The following is an example of the correct folder name:
cd <number>-hdfs-<something>
This folder should contain a file named hdfs.keytab or a similar
.keytab
file.Copy the
.keytab
file to user sqream’s Home directory on the remote machines that you are planning to use Hadoop on.Copy the following files to the
sqream sqream@server:<sqream folder>/hdfs/hadoop/etc/hadoop:
directory:core-site.xml
hdfs-site.xml
Connect to the sqream server and verify that the
.keytab
file’s owner is a user sqream and is granted the correct permissions:sudo chown sqream:sqream /home/sqream/hdfs.keytab sudo chmod 600 /home/sqream/hdfs.keytab
Log into the sqream server.
Log in as the user sqream.
Navigate to the Home directory and check the name of a Kerberos principal represented by the following
.keytab
file:
klist -kt hdfs.keytab
The following is an example of the correct output:
sqream@Host-121 ~ $ klist -kt hdfs.keytab Keytab name: FILE:hdfs.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 5 09/15/2020 18:03:05 HTTP/[email protected] 5 09/15/2020 18:03:05 HTTP/[email protected] 5 09/15/2020 18:03:05 HTTP/[email protected] 5 09/15/2020 18:03:05 HTTP/[email protected] 5 09/15/2020 18:03:05 HTTP/[email protected] 5 09/15/2020 18:03:05 HTTP/[email protected] 5 09/15/2020 18:03:05 HTTP/[email protected] 5 09/15/2020 18:03:05 HTTP/[email protected] 5 09/15/2020 18:03:05 hdfs/[email protected] 5 09/15/2020 18:03:05 hdfs/[email protected] 5 09/15/2020 18:03:05 hdfs/[email protected] 5 09/15/2020 18:03:05 hdfs/[email protected] 5 09/15/2020 18:03:05 hdfs/[email protected] 5 09/15/2020 18:03:05 hdfs/[email protected] 5 09/15/2020 18:03:05 hdfs/[email protected] 5 09/15/2020 18:03:05 hdfs/[email protected]
Verify that the hdfs service named hdfs/nn1@SQ.COM is shown in the generated output above.
Run the following:
kinit -kt hdfs.keytab hdfs/[email protected]
Check the output:
klist
The following is an example of the correct output:
Ticket cache: FILE:/tmp/krb5cc_1000 Default principal: [email protected] Valid starting Expires Service principal 09/16/2020 13:44:18 09/17/2020 13:44:18 krbtgt/[email protected]
List the files located at the defined server name or IP address:
hadoop fs -ls hdfs://<hadoop server name or ip>:8020/
Do one of the following:
If the list below is output, continue with the next step.
If the list is not output, verify that your environment has been set up correctly.
If any of the following are empty, verify that you followed Step 6 in the Configuring an HDFS Environment for the User sqream section above correctly:
echo $JAVA_HOME echo $SQREAM_HOME echo $CLASSPATH echo $HADOOP_COMMON_LIB_NATIVE_DIR echo $LD_LIBRARY_PATH echo $PATH
Verify that you copied the correct keytab file.
Review this procedure to verify that you have followed each step.