One of the customer recently asked us how to use ETL Validator for validating data counts with allowed variance. For the purpose of this post lets consider a customer table in the source called 'CUSTOMER' and a corresponding target table called 'CUSTOMER_D'. The requirement is to automate the validation of the number customers per country with an allowed variance of 10%. Here are the source and target queries : 


Source Query:    

select country_cd, count(*) src_count from customer group by country_cd

   

COUNTRY_CD SRC_COUNT
USA 1414777
CAN 735913



Target Query:  

select country_cd, count(*) tgt_count from Customer_D group by country_cd

  

COUNTRY_CD TGT_COUNT
USA 14156227
CAN 835412


This scenario can be automated using ETL Validator using the Component Test Case. The design looks like the diagram below: 


  1. Create two DB Components with the source and target queries as shown above. 

    4D5nYUzLoFuC1oIWGoEwaojdQAQoOS-jxA.png





  2. Note down the result table names in the properties step of the 'source' and 'target' DB components as shown below:


    j8kBMd0O7lJLEKlmlgQLx59lzHucVR1sqA.png




  3. Create a new DB component (I called it 'join') with the query to compute the variation in percentage. Notice that I am using the names pf the 'source' and 'target' component result tables. My result tables for source and target component were T_COMP_28_147 and T_COMP_28_148

     

    select a.country_cd, a.src_count, b.tgt_count, (b.tgt_count-a.src_count)*100/a.src_count pct_variation from T_COMP_28_147 a, T_COMP_28_148 b where a.country_cd = b.country_cd

     


  4. Next step is to define the validation for checking the allowed variation range. There are two options : 
    1. Option 1: Create a 'Value Validation' as shown below in the 'Join' DB component. Please note that the table name 'T_COMP_28_149' is the result table of the 'Join' DB Component

      w91ynBFzoH8Y-97yfghC17wO7aR0SXJuQQ.png




    2. Option 2: Add a Data Rules component and define a validation as below. This option shows you the data that is not conforming to the allowed variation range : 

      LoiwCxqPAheoOAAyOqUPMeR1EQQKUAf6Xg.png