ETL Validator v3.4.8 comes with an embedded Apache Spark that can used to read files directly from HDFS. This article outlines the steps to create a HDFS Connection and use it with the Component test case. 


Step 1: Create an HDFS Connection

Select the HDFS connection type when adding a new Data Source. This will open a window as shown below. Enter the User Name, Password and the HDFS folder location. Test and Save the connection. 




Step 2: Use the HDFS Connection in Component test case

The HDFS connection can be used to read data from files in an HDFS folder or a specific file in an HDFS location. Open a Component test case and add a Flat File Component. When a HDFS data source is select, the popup window will look like below. 


File Name: You can select a specific file name or folder name in the HDFS location specified in the corresponding data source

HDFS File Format : This is the file extension of the files. For example, if your files are named test1.dat and test2.dat, the value should be 'dat'. 

Encoding Type: This is an option field useful if you want to specify the encoding to be used for the files.