One of the customer recently asked us how to use ETL Validator for validating data counts with allowed variance. For the purpose of this post lets consider a customer table in the source called 'CUSTOMER' and a corresponding target table called 'CUSTOMER_D'. The requirement is to automate the validation of the number customers per country with an allowed variance of 10%. Here are the source and target queries :
Source Query:
select country_cd, count(*) src_count from customer group by country_cd
COUNTRY_CD | SRC_COUNT |
USA | 1414777 |
CAN | 735913 |
Target Query:
select country_cd, count(*) tgt_count from Customer_D group by country_cd
COUNTRY_CD | TGT_COUNT |
USA | 14156227 |
CAN | 835412 |
This scenario can be automated using ETL Validator using the Component Test Case. The design looks like the diagram below:
- Create two DB Components with the source and target queries as shown above.
- Note down the result table names in the properties step of the 'source' and 'target' DB components as shown below:
- Create a new DB component (I called it 'join') with the query to compute the variation in percentage. Notice that I am using the names pf the 'source' and 'target' component result tables. My result tables for source and target component were T_COMP_28_147 and T_COMP_28_148
select a.country_cd, a.src_count, b.tgt_count, (b.tgt_count-a.src_count)*100/a.src_count pct_variation from T_COMP_28_147 a, T_COMP_28_148 b where a.country_cd = b.country_cd
- Next step is to define the validation for checking the allowed variation range. There are two options :
- Option 1: Create a 'Value Validation' as shown below in the 'Join' DB component. Please note that the table name 'T_COMP_28_149' is the result table of the 'Join' DB Component
- Option 2: Add a Data Rules component and define a validation as below. This option shows you the data that is not conforming to the allowed variation range :