Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
BenjaminLiu
Contributor II
Contributor II

How to handle increment update

We're trying to sync records from multiple data sources to one place, and we try to detect the duplicate records and let the end user select the master record from the duplicate group. Here my approach:

1. Get all of the records from the data sources and use the tUnite to merge the records and pass to tMatchGroup.

2. In the tMatchGroup group the duplicate records and then pass the group records to Data Stewardship to let the user detect the master record.

It works for one time sync. But if any data source has record (s) created or updated, we still need to transfer to the end data source. We need to do a duplication check for the new record (s) as well.

With step #2, it will generate duplication group for all of the records again (include old records), any way to only detect the new record (s) duplication group? Or any other good approach for it?

Labels (5)
4 Replies
Xiaodi_Shi
Employee
Employee

Hello,

Please have a look at CDC feature, introduced in Qlik Talend Studio which quickly identifies and captures data that has been added to, updated in, or removed from database tables and makes this change data available for future use by applications or individuals. The CDC feature is available for Oracle, MySQL, DB2, PostgreSQL, Sybase, MS SQL Server, Informix, Ingres, Teradata, and AS/400.

https://help.qlik.com/talend/en-US/studio-user-guide/8.0-R2024-09/studio-user-guide/change-data-capt...

Best regards

Sabrina

BenjaminLiu
Contributor II
Contributor II
Author

Thanks for the reply! It will help us to catch the new changes.

But my next issue is how to detect the duplicates for the new changes. Just use the tMatchGroup again to group all of the duplicate records which includes the old records, or any other way to only get the new change records duplications.

Our case is we always need to check if the current syncing records have the duplicate records with the syncing records and persist records, then let user to manually select the only one master record.

jlolling
Creator III
Creator III

You need to select for the new records the matching records in your target and provide both of them as new match group. 

BenjaminLiu
Contributor II
Contributor II
Author

Ok, thanks! I just curious about the performance.

We always need to compare the new records with the matching records. From my understanding, it will compare between the matching records again, wanna know any way only check the new records duplication from the whole records? Then the performance should be better.