Re: Duplicate Processing, Attribute level survivor... - Qlik Community

Anonymous · ‎2019-07-11

Hello Everyone,

I have an upcoming design question.

I will have a mapping that populates a Redshift table with approximately 80 fields. The aim will be to never insert a duplicate. However, there is a need to for FIELD level survivorship to be applied, where the rule will be take the latest data for a particular ID but do not replace if the latest data is null and the old data is populated. This check needs to happen for each field.

What would be the best method to adopt. Will be inserting approx. 100k rows every few hours. Updates to redshift are not allowed, inserts only.

The only way I can think of is to push this back to the database via a large cumbersome SQL case statements.

Any help would be much appreciated.

Anonymous · ‎2019-07-11

I suggest you first use tAggregateRow

It's offers the possibility to get fist/last value

Anonymous · ‎2019-07-15

Hi,

Thats not exactly what I am looking for.

Essentially want to compare two rows that share a unique ID, and take the best data forward. If already populated, populate with a the new data unless NULL. If NULL then replace with new data.

Any ideas?

Anonymous · ‎2019-07-15

Hi,

Please refer whether the below scenario matches your need. You may have to make customization based on your specific use case. But the essence is as specified in the below flow.

https://help.talend.com/reader/A3Qm3pq~qSkLMBOtSJbPhg/a4UPALg0m66CapZ9UbCQIw

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Duplicate Processing, Attribute level survivorship, Redshift, Compare Rows

Talend Data Integration

Talend MDM

v6.x

v7.x