Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to Validate Data between Source and Target

I want to compare (Validate) data between Source and Target. 

 

Source could be any JDBC or API Source with millions of records. Example, MYSQL, PostGres, MSSQL, ODBC(NetSuite), Salesforce, Zuora, Jira

 

Target - MSSQL 2014 which already consists of Data from the above Source System.

 

Basically, I want to create a Talend Job which will do a comparison of Data between each of the above Source and Target systems and send a mail if there is data mismatch.

 

Scenario:

 

1. Data Issue:

 - For Column A of table1 value in Source is 'XYZ' and for the same column, the value in the Target table is 'MPL'.

 - For Column B of table2 values in Source is 2019-01-02 and for the same column, the value in Target is 2019-01-01

 

2. Row count difference:

- Total number of Rows in Source table1  is 500 and in Target for same table value is 455.

 

 

 

- Talend 6.3.1 / 7

- Java 8

 

 

Labels (3)
7 Replies
Anonymous
Not applicable
Author

Hi,

 

   Since you are having lot of records in both source and target, the most easy method is to store the source records to a stage table and then do an inner join within DB itself between source and target table using the key values.

 

    If the number of records coming after inner join is less than initial number of records, it means that there is some difference and there are some records in stage table which are not present in target table.

 

    You can do the same step in Talend also (instead of doing it in DB layer). But if you need to do it in Talend, you will have to read all the records from DB to Talend to do the comparison. So it will be better to load to stage table using Talend and do the inner join comparison in DB itself using tDBInput or tDBrow component.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

Anonymous
Not applicable
Author

Thanks for the prompt reply @nthampi. But doing inner join won't be of much help as it will only cover the 2nd scenario of my post (Row Count difference). But 1st scenario (Data Issue) won't be handled using just Inner join.
Anonymous
Not applicable
Author

Hi,

 

    Your first scenario is as shown below.

 

 - For Column A of table1 value in Source is 'XYZ' and for the same column, the value in the Target table is 'MPL'.

 - For Column B of table2 values in Source is 2019-01-02 and for the same column, the value in Target is 2019-01-01

Lets take first example. In your case, the source is having a data of XYZ but target is having MPL. So the inner join will not work for this source record. In this case, your inner join will return 0 record even though source is having 1 sample record.

 

Now, you will have to pick these rejected records and verify where the mismatch happened.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

Anonymous
Not applicable
Author

Exactly that's my point. using inner join I will just get count of rejected records. i.e. 1 Record which is not in sync between source and target. But in order to identify where exactly mismatch happened, for that I will have to manually compare all the columns data for that rejected row between source table and Target. which i want to avoid.

What I want is after comparison (Data Validation) is complete display that 1 row difference in a txt/csv/sql table as below:

Table1(Source) Table2 (Target)
Column A Column B
XYZ MPL
Anonymous
Not applicable
Author

Hi Vikas,

 

    The concept of inner join reject is that the source record does not match with any of the target records. So we will not be able to do a comparison in this case if the target records are more (lets say in 100s or 1000s).

 

   Any rejected source record can be a candidate to be matched with any of the target record and not any specific one. One another method is to pick the inner join reject records as input for a subsequent flow where you have an alternate match rule available between source and target.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

Anonymous
Not applicable
Author

@nthampi 

I created a dummy job with Inner Join reject, But still, it does not give me my expected output. Because Inner join rejects just gives me rows (ids) which do not match between 2 tables. In this case (id=4) 

Please check attached screenshots and let me know what changes/enhancement I need to do to get the expected output.

 

0683p000009M5ZF.jpgInput_Table_Structure0683p000009M6gY.jpgJob Design&Output0683p000009M6rL.jpgtMap Settings

 

 

 

since_1995
Contributor III
Contributor III

I have the same problem , is there any solution for this?