Apache Drill
- Apache Drill is an open source tool.
- Using this tool we can query a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files.
- Apache drill also supports querying a folder which contains files with same format
Installation
- Click on the following link. It will walk you through the installation and starting process of Drill. Extract the downloaded file and run a command in command line.
Installing apache ODBC driver for windows
- Click on the following link to install apache ODBC driver for windows. This tool will be useful to generate queries for the files.
Generating queries using Apache Drill ODBC :
- Open Windows Run.(using Windows + R button in key board).
- Type odbcad32.
- Click on OK.
- In the System DSN tab, select MapR ODBC Driver for Drill DSN and click on OK.
- Default port for drill is 31010 and give host name (localhost for local machine)
- Click on Drill Explorer.
- Suppose there are two files with same format File1.csv and File2.csv in the test folder.
- In the dfs.root , those files are displayed as shown below. And by selecting any file, data can be previewed.
- And in the sql tab, sql query is shown as below
- Modify the last line in the above query - remove "/File2.csv" from 'Flatfiles/test/File2.csv'. Limit is also not required. This query can be used in the ETL Validator.
Apache Drill Connection creation in the ETL Validator
- Select 'Apache Drill' in the 'Big Data' group from the 'Add Data Source' menu
- Enter the connection related details and select the " dfs.root" as schema.
- Click on 'Test' to test the connection details; save the connection.