ETL Validator v3.4.8 comes with an embedded Apache Spark that can used to read files directly from HDFS. This article outlines the steps to create a HDFS Connection and use it with the Component test case.
Step 1: Create an HDFS Connection
Select the HDFS connection type when adding a new Data Source. This will open a window as shown below. Enter the User Name, Password and the HDFS folder location. Test and Save the connection.
Step 2: Use the HDFS Connection in Component test case
The HDFS connection can be used to read data from files in an HDFS folder or a specific file in an HDFS location. Open a Component test case and add a Flat File Component. When a HDFS data source is select, the popup window will look like below.
File Name: You can select a specific file name or folder name in the HDFS location specified in the corresponding data source
HDFS File Format : This is the file extension of the files. For example, if your files are named test1.dat and test2.dat, the value should be 'dat'.
Encoding Type: This is an option field useful if you want to specify the encoding to be used for the files.