Skip to main content
Announcements
July 15, NEW Customer Portal: Initial launch will improve how you submit Support Cases. IMPORTANT DETAILS
cancel
Showing results for 
Search instead for 
Did you mean: 
mobmsc2
Creator
Creator

(tAggregate,tDenormalise,tPivotToColumns) which is best at handling blank fields

I would like to clean up some duplicate rows so that I combine all available data for duplicate rows into 1 but also not replicate data that already exists.

I'm not sure which component (tAggregate,tDenormalise,tPivotToColumns) would suit best as they all seem to involve combining all values found or selecting the first or last value. Its not safe to assume that all new data only appears in the 1st occurrence  

 

Is there a way to only execute the combine task when "new" information is detected without alot of custom logic testing string lengths and null values?

 

Sample data

ID|Name|Country|Town

1|Tom| Spain|

1|Tom|null|Madrid |

2| |Germany|Berlin

2|John| Germany|

3|Paul| |Glasgow

3|Paul|Scotland| 

 

 

and end up with

ID|Name|Country|Town

1|Tom|Spain|Madrid |

2|John| Germany|Berlin

3|Paul|Scotland|Glasgow

 

not

ID|Name|Country|Town

1|Tom;Tom|Spain|Madrid |

2|;John| Germany;Germany|Berlin

3|Paul;Paul|;Scotland|Glasgow;Glasgow

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Hi,

 

    I hope you are looking for this solution.

0683p000009M2vM.png

 

tMap will remove any hardcoded null from the input data.

0683p000009M2vR.png

 

For example, 

Relational.ISNULL(row1.name) || row1.name.equalsIgnoreCase("null")?null:row1.name 

aggregation can be done like below.

0683p000009M2vb.png

 

Hope your issue is resolved 0683p000009MACn.png

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

View solution in original post

8 Replies
DataTeam1
Creator
Creator

Did you try tAggregate with MAX function insted of first or last?

akumar2301
Creator III
Creator III

Is it possible that 2nd occurrence will have different info then the 1st

 

2| |Germany|Berlin

2|John| France|

2|John| France|Paris

 

 

Anonymous
Not applicable

Hi,

 

    I hope you are looking for this solution.

0683p000009M2vM.png

 

tMap will remove any hardcoded null from the input data.

0683p000009M2vR.png

 

For example, 

Relational.ISNULL(row1.name) || row1.name.equalsIgnoreCase("null")?null:row1.name 

aggregation can be done like below.

0683p000009M2vb.png

 

Hope your issue is resolved 0683p000009MACn.png

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

mobmsc2
Creator
Creator
Author

Hi @nthampi 

Can you explain what the MAX function does with text data?

DataTeam1
Creator
Creator

Max in text case means the last one in alphabetical order

 

For example when you have 3 values: 'P', 'B', 'L', null, 'H'

max => 'P'

min => 'B'

 

Null is omitted in functions max and min

Anonymous
Not applicable

Thanks @DataTeam  for pitching in.

 

@Moe , Could you please mark the topic as solution provided since we have answered your query?

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

mobmsc2
Creator
Creator
Author

So your process
Step 1 if row value is null OR equals the string null in whatever case possible RETURN null or the value
Step 2 Take the max value excluding nulls from what ever is found when joining by ID.

I had a look at the tAggregate docs and it doesn't explain what the functions count, min, max, avg, sum, first, last, list, list(objects), count(distinct), standard deviation do or what is acceptable input data types. I thought inputs had to be numeric
Anonymous
Not applicable

Hi,

 

   If it is a String, min and max functions work in alphabetical order. Other functions are not needed in your use case.

 

   Are you facing any issue in current solution? I hope it is covering your use case.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂