Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Morning Guys,
I have a file looks like bellow and i want to merge all rows which have the same ID (first column) in a single row.
ID Test_1 Date_test_1 Test_2 Date_test_2 Test_3 Date_test_3 1 yes 06/09/2017 4 no yes 20/04/2004 4 01/10/2000 yes 17/05/1982 1 no 02/02/2015
I try to use tAggregateRow component but it does not help in my situation; because colums can contain or not data and i have not function to extract it.
I want to get an output like bellow.
I also see on this link : https://help.talend.com/reader/ZndcSsDNtKg8FpNIRCdjag/jPZNgHRF5ad4KUp4Mpvzkg
Which suggest to use component tSurviveFields but it does not present on Studio and not on talend Exchange
ID Test_1 Date_test_1 Test_2 Date_test_2 Test_3 Date_test_3 1 yes 06/09/2017 no 02/02/2015 4 no 01/10/2000 yes 20/04/2004 yes 17/05/1982
Thank you for help,
The reason is that fields are empty ("") when they come from tFileInputDelimited instead of null.
Try to include a tJavaRow after tFileInputDelimited to replace empty strings by null using this example:
output_row.Organization_Name = input_row.Organization_Name.equals("") ? null : input_row.Organization_Name.equals("")
Hi,
Using a tMap with the same input file for the main row and the lookup you can do what you expect.
tSurviveFields is a DQ component (not available in DI).
Hope this helps.
Hello TRF,
I am sorry but i don't see how to do it.
The example i put here is limit to few columns but in reality i have more than 30 colums.
And concern the same ID, it can be repeat 20 times like record.
Thank for reply
As a quick response you can start with the following design:
Both input (tFixedFlowInput) are identical and contain the following records (I know, there is only 2 fields):
100,1000 100,2000 200, 200,4000 200,5000 300,6000 400, 400, 400,7000
On the main flow, tUniqRow is used to keep only 1 record for each id value.
The same input is used as a lookup for the tMap whch is configured like this:
So, you have to repeat the operation shown for the 2nd output field for all the desired fields (this is the bad part, nothing is magic).
And finally, the result is the following:
Starting job test at 16:22 06/09/2017. [statistics] connecting to socket on port 3779 [statistics] connected 100|1000 200|5000 300|6000 400|7000 [statistics] disconnected Job test ended at 16:22 06/09/2017. [exit code=0]
AS you can see, if you have more than 2 records for the same id, the final value for the output fields is taken from the last record (look at id = 200).
If this is not what you expect (you want the 1st not null value as the final value), you need a more complex answer derived from this solution I've proposed for an other subject (https://stackoverflow.com/questions/46050056/talend-avoid-duplicate-external-id-with-salesforce-outp...). The idea is to iterate over all the duplicates for the same id 1 by 1 (instead of getting all the records at once), and do the same operation as proposed above. Like that, as soon as a field is populated, we will stop to try to fill it again.
Hello TRF,
Your solution on Stackoverflow is helpful because i want the 1st not null value as the final value.
I my case, i need to insert record on MySql database, so write records 1 by 1 will take many time.
So i should merge all records before to push it on MySql.
I am trying to adapt your job on Stackoverflow to solve my problem.
Any help will be usefull
Thanks,
Simple solutions are often the best, and here the it seems to be tAggregateRow.
tFixedFlowInput with the data sample and the schema:
In tAggregateRow, I use "first" function with "ignore null values" option:
Here is the result:
Starting job test at 22:40 06/09/2017. [statistics] connecting to socket on port 3413 [statistics] connected 100|1000|a 200|5000|a 300|6000|c 400|7000|a [statistics] disconnected Job test ended at 22:40 06/09/2017. [exit code=0]
I think that's what you need.
Morning TRF,
It is exactly what i expect, thank you very much for your help.
I know your suggestion is the solution, but i try to test it with an exemple of input file in my Job and i don't get what i expect.
I attach my input file into this post, and bellow the screenshot of my job.
I show you also config of tAggregateRow component with the screenshot bellow.
I active "ignore null value" option for all fields without the 6 first one.
Thank you for help
Here is the result I get.
Seems OK from my point of view.
Job attached with the complete schema.
Edit: + the result
@mbocquet let me know if the result is ok for.
If it is, don't forget to mark the topic as solved (Kudo also accepted).
Hello TRF,
Since my last post, i didn't get time to test your job. I do it tomorrow and let you know.
Thank you very much for help.