Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I want to compare (Validate) data between Source and Target.
Source could be any JDBC or API Source with millions of records. Example, MYSQL, PostGres, MSSQL, ODBC(NetSuite), Salesforce, Zuora, Jira
Target - MSSQL 2014 which already consists of Data from the above Source System.
Basically, I want to create a Talend Job which will do a comparison of Data between each of the above Source and Target systems and send a mail if there is data mismatch.
Scenario:
1. Data Issue:
- For Column A of table1 value in Source is 'XYZ' and for the same column, the value in the Target table is 'MPL'.
- For Column B of table2 values in Source is 2019-01-02 and for the same column, the value in Target is 2019-01-01
2. Row count difference:
- Total number of Rows in Source table1 is 500 and in Target for same table value is 455.
- Talend 6.3.1 / 7
- Java 8
Hi,
Since you are having lot of records in both source and target, the most easy method is to store the source records to a stage table and then do an inner join within DB itself between source and target table using the key values.
If the number of records coming after inner join is less than initial number of records, it means that there is some difference and there are some records in stage table which are not present in target table.
You can do the same step in Talend also (instead of doing it in DB layer). But if you need to do it in Talend, you will have to read all the records from DB to Talend to do the comparison. So it will be better to load to stage table using Talend and do the inner join comparison in DB itself using tDBInput or tDBrow component.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved
Hi,
Your first scenario is as shown below.
- For Column A of table1 value in Source is 'XYZ' and for the same column, the value in the Target table is 'MPL'. - For Column B of table2 values in Source is 2019-01-02 and for the same column, the value in Target is 2019-01-01
Lets take first example. In your case, the source is having a data of XYZ but target is having MPL. So the inner join will not work for this source record. In this case, your inner join will return 0 record even though source is having 1 sample record.
Now, you will have to pick these rejected records and verify where the mismatch happened.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved
Hi Vikas,
The concept of inner join reject is that the source record does not match with any of the target records. So we will not be able to do a comparison in this case if the target records are more (lets say in 100s or 1000s).
Any rejected source record can be a candidate to be matched with any of the target record and not any specific one. One another method is to pick the inner join reject records as input for a subsequent flow where you have an alternate match rule available between source and target.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved
I created a dummy job with Inner Join reject, But still, it does not give me my expected output. Because Inner join rejects just gives me rows (ids) which do not match between 2 tables. In this case (id=4)
Please check attached screenshots and let me know what changes/enhancement I need to do to get the expected output.
I have the same problem , is there any solution for this?