With more and more companies moving to the cloud (eg. AWS) , there is a possibility that access to the Repository/Workschema database (Oracle or PostgreSQL) and the data sources may not be directly available from the client machines. The purpose of this article is to the provide guidance on deploying and using ETL Validator for such environments. 


What is SSH Tunneling? 

Tunneling is a technique that can be used to create a SSH connection between your PC and a server and make it appear as if the remote database you’re accessing through the tunnel is locally running from your machine. SSH connection (and the tunnel) can be created from command line in linux environments. For windows, a free tool called Putty can be used. For the purpose of this article, we will use Putty. 


ETL Validator Deployment

Assuming that all the data sources (or databases) are located in the Cloud and you are planning for a central repository and server for ETL Validator, the best deployment configuration would be : 

ETL Validator Server : Deploy on the Cloud linux or windows server 

Repository / Workschema database (Oracle or PostgreSQL) : Deploy on the cloud 

ETL Validator Client : Installed locally but connects to the ETL Validator repository/workschema and server using SSH tunnel


ETL Validator Client only connects to the ETL Validator Server and Repository/Workschema. All other connections to the data sources (source and target databases) is done using the ETL Validator Server. So there is no need for creating SSH tunnels for the source and target databases. 


Prerequisites

  • Setup ETL Validator Server on a cloud linux or windows server
  • Setup an Oracle or PostgreSQL database as repository/workschema for the cloud. If you already have an Oracle database, create a couple of schemas as outlined in the article here
  • A linux server to connect to using SSH that has access to the data sources, repository/workschema and the ETL Validator Server



Create a new SSH Session using Putty

Open PuTTY and fill out the "Host Name" box with your linux server name or IP address and the SSH port to connect to it. Don’t forget to give a name to the session and save it. Select Connection->SSH and provide the "Private key file for authentication" if you have a private key file :




Setup Port Forwarding

Select the Connection->SSH->Tunnels option and enter the following information: 

  • Source Port : The port used on the local system to simulate the remote port. In the example below, port number 1525 on the local windows machine is used to simulate the oracle database port of 1521. 
  • Destination : The remote database server name and port number.


Click on 'Add' button to save the port information. Here is how the screen will look like once you had clicked the Add button:


Open the Connection

Save the connection and click Open to connect to the remote linux server using SSH. When prompted enter the username/password. Upon successful login, the SSH connection has been created. 


Setup SSH connection for ETL Validator Server

Follow the same steps to create and SSH connection (and tunnel) for ETL Validator Server. In a multi-user environment, make sure that the same source port number is used while setting up the PuTTY connection by all users for the ETL Validator Server. 


Setup ETL Validator Repository/Workschema and ETL Validator Server URLs 

Open ETL Validator Client and click on the '+' sign to add a new repository connection. Use 'localhost' as the host name and the 'source port' as the port number. See sample screenshot below : 


Update the ETL Validator Server URL when prompted to use the 'localhost' as the host name and the SSH source port as the port number. Similarly, update the Workschema connection information to use localhost as the host name and the SSH source port number. 


Users can now connect to the remote repository/workschema using SSH tunneling !!