Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Duplicate Processing, Attribute level survivorship, Redshift, Compare Rows

Hello Everyone,

 

I have an upcoming design question.

 

I will have a mapping that populates a Redshift table with approximately 80 fields. The aim will be to never insert a duplicate. However, there is a need to for FIELD level survivorship to be applied, where the rule will be take the latest data for a particular ID but do not replace if the latest data is null and the old data is populated. This check needs to happen for each field.

 

What would be the best method to adopt. Will be inserting approx. 100k rows every few hours. Updates to redshift are not allowed, inserts only.

 

The only way I can think of is to push this back to the database via a large cumbersome SQL case statements.

 

Any help would be much appreciated.

Labels (4)
3 Replies
Anonymous
Not applicable
Author

I suggest you first use tAggregateRow

It's offers the possibility to get fist/last value
Anonymous
Not applicable
Author

Hi,

Thats not exactly what I am looking for.

Essentially want to compare two rows that share a unique ID, and take the best data forward. If already populated, populate with a the new data unless NULL. If NULL then replace with new data.

Any ideas?
Anonymous
Not applicable
Author

Hi,

 

    Please refer whether the below scenario matches your need. You may have to make customization based on your specific use case. But the essence is as specified in the below flow.

 

https://help.talend.com/reader/A3Qm3pq~qSkLMBOtSJbPhg/a4UPALg0m66CapZ9UbCQIw

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂