Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I would like to clean up some duplicate rows so that I combine all available data for duplicate rows into 1 but also not replicate data that already exists.
I'm not sure which component (tAggregate,tDenormalise,tPivotToColumns) would suit best as they all seem to involve combining all values found or selecting the first or last value. Its not safe to assume that all new data only appears in the 1st occurrence
Is there a way to only execute the combine task when "new" information is detected without alot of custom logic testing string lengths and null values?
Sample data
ID|Name|Country|Town
1|Tom| Spain|
1|Tom|null|Madrid |
2| |Germany|Berlin
2|John| Germany|
3|Paul| |Glasgow
3|Paul|Scotland|
and end up with
ID|Name|Country|Town
1|Tom|Spain|Madrid |
2|John| Germany|Berlin
3|Paul|Scotland|Glasgow
not
ID|Name|Country|Town
1|Tom;Tom|Spain|Madrid |
2|;John| Germany;Germany|Berlin
3|Paul;Paul|;Scotland|Glasgow;Glasgow
Hi,
I hope you are looking for this solution.
tMap will remove any hardcoded null from the input data.
For example,
Relational.ISNULL(row1.name) || row1.name.equalsIgnoreCase("null")?null:row1.name
aggregation can be done like below.
Hope your issue is resolved
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Did you try tAggregate with MAX function insted of first or last?
Is it possible that 2nd occurrence will have different info then the 1st
2| |Germany|Berlin
2|John| France|
2|John| France|Paris
Hi,
I hope you are looking for this solution.
tMap will remove any hardcoded null from the input data.
For example,
Relational.ISNULL(row1.name) || row1.name.equalsIgnoreCase("null")?null:row1.name
aggregation can be done like below.
Hope your issue is resolved
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi @nthampi
Can you explain what the MAX function does with text data?
Max in text case means the last one in alphabetical order
For example when you have 3 values: 'P', 'B', 'L', null, 'H'
max => 'P'
min => 'B'
Null is omitted in functions max and min
Thanks @DataTeam for pitching in.
@Moe , Could you please mark the topic as solution provided since we have answered your query?
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi,
If it is a String, min and max functions work in alphabetical order. Other functions are not needed in your use case.
Are you facing any issue in current solution? I hope it is covering your use case.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂