Apache Drill

  • Apache Drill is an open source tool. 
  • Using this tool we can query a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files.
  • Apache drill also supports querying a folder which contains files with same format

  

Installation


Installing apache ODBC driver for windows


Generating queries using Apache Drill ODBC :

  • Open Windows Run.(using Windows + R button in key board).
  • Type odbcad32.   
  • Click on OK.



  • In the System DSN tab, select  MapR ODBC Driver for Drill DSN  and click on OK.


  • Default port for drill is 31010 and give host name (localhost for local machine)
  • Click on Drill Explorer.



  • Suppose there are two files with same format File1.csv and File2.csv in the test folder.


  • In the dfs.root , those files are displayed as shown below. And by selecting any file, data can be previewed.


  • And in the sql tab, sql query is shown as below

  • Modify the last line in the above query - remove "/File2.csv" from 'Flatfiles/test/File2.csv'. Limit is also not required. This query can be used in the ETL Validator.


Apache Drill Connection creation in the ETL Validator

  • Select 'Apache Drill' in the 'Big Data' group from the 'Add Data Source' menu


  • Enter the connection related details and select the " dfs.root" as schema.


  • Click on 'Test' to test the connection details; save the connection.