Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Q&A with Qlik - Qlik Cloud Migration: Questions about migrating to Qlik Cloud? Catch the latest replay!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to avoid complete rematch of customer DB

Hi there,

we are implementing Talend MDM as part of a “Single Consumer View” project, i.e. with a rather large scale.

New consumer registrations/profiles are being ingested in near real-time and need to be compared to the existing records that have already been processed before.

Logically speaking, that implies that only the NEW records have to be compared to the old records. But there is no need to re-compare two old consumer records.

Simply speaking, if ONE new consumer profile comes in (while having one million pre-existing cleansed profiles in the DB), it should only be necessary to do one million comparisons rather than comparing every single combination across the entire input set.

Moreover, if a new incoming record matches an existing record which already has a GUID, then we would like to reuse the existing GUID (by linking the new record to the same GUID).

Unfortunately, it seems that the tMatchGroup component isn’t able to distinguish between old and new records, but rather supports only a single input record set (consisting of both old and new records) which implies that every record will be compared to any other record. Also, in case of a match, any preexisting GUID’s/GID’s are ignored, instead Talend creates new GUID’s/GID’s.

The only documented approach I could find to limit the number of comparisons is to use blocking keys. I am confused though as this implies that the record set is pre-partitioned based on the value of some fields. E.g. using the two first letters of the last name as blocking keys implies that these letters have to be the same in order to even qualify for a comparison.

Any suggestion of how to achieve the desired behaviour? It feels like we are missing certain fundamental conceptual MDM constraints.

Thanks Nick
Labels (2)
1 Reply
Anonymous
Not applicable
Author

Can you give any example of you records sets

 

In tMatch group component, w ehave GID, with that we can create a unique id, that unique id again we can compare with Master attribute column in tmatch group component, if its value is 1 that means that set already exist.That the way to handle old records