topic Re: Duplicate Processing, Attribute level survivorship, Redshift, Compare Rows in Talend Studio

Duplicate Processing, Attribute level survivorship, Redshift, Compare Rows

Anonymous — Sat, 16 Nov 2024 05:19:09 GMT

Hello Everyone,

I have an upcoming design question.

I will have a mapping that populates a Redshift table with approximately 80 fields. The aim will be to never insert a duplicate. However, there is a need to for FIELD level survivorship to be applied, where the rule will be take the latest data for a particular ID but do not replace if the latest data is null and the old data is populated. This check needs to happen for each field.

What would be the best method to adopt. Will be inserting approx. 100k rows every few hours. Updates to redshift are not allowed, inserts only.

The only way I can think of is to push this back to the database via a large cumbersome SQL case statements.

Any help would be much appreciated.

Re: Duplicate Processing, Attribute level survivorship, Redshift, Compare Rows

Anonymous — Thu, 11 Jul 2019 13:02:18 GMT

I suggest you first use tAggregateRow

It's offers the possibility to get fist/last value

Re: Duplicate Processing, Attribute level survivorship, Redshift, Compare Rows

Anonymous — Mon, 15 Jul 2019 15:31:33 GMT

Hi,

Thats not exactly what I am looking for.

Essentially want to compare two rows that share a unique ID, and take the best data forward. If already populated, populate with a the new data unless NULL. If NULL then replace with new data.

Any ideas?

Re: Duplicate Processing, Attribute level survivorship, Redshift, Compare Rows

Anonymous — Mon, 15 Jul 2019 17:46:51 GMT

Hi,

Please refer whether the below scenario matches your need. You may have to make customization based on your specific use case. But the essence is as specified in the below flow.

https://help.talend.com/reader/A3Qm3pq~qSkLMBOtSJbPhg/a4UPALg0m66CapZ9UbCQIw

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂