Kerberos Authentication is the security mechanism that is commonly used for controlling access to the HDFS and HIVE. This knowledge base post is intended to provide the details on configuration steps needed for creating a connection in ETL Validator to Hive using Kerberos as the authentication mechanism.


ETL Validator connects to all data sources (including Hive) using ETL Validator Server. So these configuration steps are needed only in the ETL Validator Server machine.


Basics of Kerberos Authentication


Ticket Granting Ticket

Ticket Granting Ticket (TGT) is a small, encrypted identification file with a limited validity period. TGT exists so users don't have to enter their password every time they wish to connect to a kerberized service, or keep a copy of their password around. When an user logs into Windows, they are authenticated with the Key Distribution Centre (KDC) to get a TGT. The TGT is used to obtain a service ticket from the Ticket Granting Service (TGS). 


Service Principal and User Principal

The User Principal is the fully qualified username of a user from a particular domain (eg. john@MYREALM). The Service Principal is essentially the same thing but is for a computer instead of a user and includes the protocol for which it is valid (eg. hive/hostname@MYREALM).


Tomcat and Kerberos 

Tomcat can be configured to use the Kerberos authentication using a file called jaas.conf. The default location of the file is CATALINA_BASE/conf. The location of this file can be changed by setting the below property in tomcat startup. 

-Djava.security.auth.login.conf=PATH_TO_LOGIN_CONF


Keytab file and TGT Cache

There are two options for configuring tomcat to get hold of the TGT. Key tab file can be generated on the domain controller (KDC) for the principal. It is copied over to the server where ETL Validator Server is installed. By using a Keytab file, the TGT can be refreshed automatically by the ETL Validator Server. In some cases, it is not possible to get the keytab file. In this case, the TGT in the local ticket cache can be used for authentication. When an user logs into windows, the TGT cache is created and stored in the windows memory. ETL Validator Server (and tomcat) is not able to access this cache. ETL Validator Server can only access the TGT cache file created by the MIT Kerberos Ticket manager. The downside of using the ticket cache is that it needs to be renewed manually by logging in whenever it expires. 


Prerequisites


Information needed for getting started

  1. Host name of the node in the hadoop cluster where the HIVE server 2 is running
  2. The Port that HiveServer2 is running on – typically this is “10000”, and the Hive database name (for example, “default’)
  3. The krb5.conf file from the Kerberos KDC, which contains details of the Kerberos realm, URL for the KDC server, and other key connection details
  4. Service Principal Name that will be used for the connection eg. hive/quickstart.cloudera@CLOUDERA. Test that the principal is working using beeline.
  5. Keytab file for the above principal. You can verify the principle in the keytab file, using the command below: 
    C:\app\Datagaps\ETLValidator\java\jre\bin\ktab -l -k FILE:<Keytab File Location>


This information can be obtained from an Hadoop administrator. 


Configuration Steps


Follow these steps to configure the machine where ETL Validator Server is running once you have the above information: 

1. Download unlimited strength Cryptography package for Oracle java website and place it in the <JRE_HOME>/lib/security folder or the <JDK_HOME>/jre/lib/security that is being used by ETL Validator Server (tomcat)

2. Copy krb5.conf file into the <JRE_HOME>/lib/security folder or the <JDK_HOME>/jre/lib/security that is being used by ETL Validator Server (tomcat). Alternatively, krb5.ini file can be placed in the C:\Windows directory if these steps are being performed on a windows machine


3. Create a .hosts entry in if the hostname of the hive server (KDC) is not accessed. This step is not needed in general unless you are connecting to virtual machine running locally. See below for an example of the .hosts entry.

10.0.0.13 quickstart.cloudera


4. By default, ETL Validator Server (tomcat) service runs as 'Local System' user. Open Services window and change the properties of 'datagaps ETL Validator Server' service to the logged in user as shown below: 



As mentioned earlier, you can either use the Keytab file or the Ticket cache for getting access to the ticket.   


Using Keytab file


1. Copy the tomcat.keytab file created on the domain controller to $CATALINA_BASE/conf/tomcat.keytab folder. $CATALINA_BASE is the directory where tomcat is installed.

2. Create the JAAS login configuration file $CATALINA_BASE/conf/jaas.conf. A sample file is shown below:

com.sun.security.jgss.krb5.initiate {
    com.sun.security.auth.module.Krb5LoginModule required
    doNotPrompt=true
    principal="HTTP/win-tc01.dev.local@DEV.LOCAL"
    useKeyTab=true
    keyTab="c:/apache-tomcat-7.0.x/conf/tomcat.keytab"
    storeKey=true;
};

com.sun.security.jgss.krb5.accept {
    com.sun.security.auth.module.Krb5LoginModule required
    doNotPrompt=true
    principal="HTTP/win-tc01.dev.local@DEV.LOCAL"
    useKeyTab=true
    keyTab="c:/apache-tomcat-7.0.x/conf/tomcat.keytab"
    storeKey=true;
};


The tomcat link has more information on this : https://tomcat.apache.org/tomcat-7.0-doc/windows-auth-howto.html


Note: For windows, please use double backslash or a single forward slash while updating the keytab file location. For example: C:\\app\\Datagaps\\ETLValidator\\Server\\apache-tomcat-7.0.57\\conf\\etlval.keytab


3. Restart ETL Validator Server service. 

Using Ticket Cache in Windows


1. Download and Install MIT Ticket Manager : http://web.mit.edu/kerberos/dist/#kfw-4.1


2. Copy the krb5.ini file to C:\ProgramData\MIT\Kerberos5


3. Setup environment variable KRB5CCNAME to point to the Ticket Cache file. Important: Restart the computer after the environment variable has been set. 


4. Create the JAAS login configuration file $CATALINA_BASE/conf/jaas.conf. A sample file is shown below:


com.sun.security.jgss.krb5.initiate {
    com.sun.security.auth.module.Krb5LoginModule required
    doNotPrompt=true
    principal="HTTP/win-tc01.dev.local@DEV.LOCAL"
    useKeyTab=false
    useTicketCache=true
    ticketCache="C:\\KRB"
    storeKey=false;
};

com.sun.security.jgss.krb5.accept {
    com.sun.security.auth.module.Krb5LoginModule required
    doNotPrompt=true
    principal="HTTP/win-tc01.dev.local@DEV.LOCAL"
    useKeyTab=false
    useTicketCache=true
    ticketCache="C:\\KRB"
    storeKey=false;
};


5. Restart ETL Validator Server service. 


 
Connect using ETL Validator Client
  

Create a new connection from ETL Validator Client. An example connection is shown below that uses the keytab file. When using the ticket cache, leave the Keytab file blank: 


             ETL Validator Hive2 connection