Solved: Merge columns of a file - Qlik Community

idembel2 · ‎2017-09-06

Morning Guys,

I have a file looks like bellow and i want to merge all rows which have the same ID (first column) in a single row.

ID	Test_1	Date_test_1	Test_2	Date_test_2	Test_3	Date_test_3
1	yes	06/09/2017	 	 	 	 
4	no	 	yes	20/04/2004	 	 
4	 	01/10/2000	 	 	yes	17/05/1982
1	 	 	no	02/02/2015

I try to use tAggregateRow component but it does not help in my situation; because colums can contain or not data and i have not function to extract it.

I want to get an output like bellow.

I also see on this link : https://help.talend.com/reader/ZndcSsDNtKg8FpNIRCdjag/jPZNgHRF5ad4KUp4Mpvzkg

Which suggest to use component tSurviveFields but it does not present on Studio and not on talend Exchange

ID	Test_1	Date_test_1	Test_2	Date_test_2	Test_3	Date_test_3
1	yes	06/09/2017	no	02/02/2015	 	 
4	no	01/10/2000	yes	20/04/2004	yes	17/05/1982

Thank you for help,

TRF · ‎2017-09-10

The reason is that fields are empty ("") when they come from tFileInputDelimited instead of null.

Try to include a tJavaRow after tFileInputDelimited to replace empty strings by null using this example:

output_row.Organization_Name = input_row.Organization_Name.equals("") ? null : input_row.Organization_Name.equals("")

View solution in original post

TRF · ‎2017-09-06

Hi,

Using a tMap with the same input file for the main row and the lookup you can do what you expect.

tSurviveFields is a DQ component (not available in DI).

Hope this helps.

idembel2 · ‎2017-09-06

Hello TRF,

I am sorry but i don't see how to do it.

The example i put here is limit to few columns but in reality i have more than 30 colums.

And concern the same ID, it can be repeat 20 times like record.

Thank for reply

TRF · ‎2017-09-06

As a quick response you can start with the following design:

Both input (tFixedFlowInput) are identical and contain the following records (I know, there is only 2 fields):

On the main flow, tUniqRow is used to keep only 1 record for each id value.

The same input is used as a lookup for the tMap whch is configured like this:

So, you have to repeat the operation shown for the 2nd output field for all the desired fields (this is the bad part, nothing is magic).

And finally, the result is the following:

Starting job test at 16:22 06/09/2017.

[statistics] connecting to socket on port 3779
[statistics] connected
100|1000
200|5000
300|6000
400|7000
[statistics] disconnected
Job test ended at 16:22 06/09/2017. [exit code=0]

AS you can see, if you have more than 2 records for the same id, the final value for the output fields is taken from the last record (look at id = 200).

If this is not what you expect (you want the 1st not null value as the final value), you need a more complex answer derived from this solution I've proposed for an other subject (https://stackoverflow.com/questions/46050056/talend-avoid-duplicate-external-id-with-salesforce-outp...). The idea is to iterate over all the duplicates for the same id 1 by 1 (instead of getting all the records at once), and do the same operation as proposed above. Like that, as soon as a field is populated, we will stop to try to fill it again.

idembel2 · ‎2017-09-06

Hello TRF,

Your solution on Stackoverflow is helpful because i want the 1st not null value as the final value.

I my case, i need to insert record on MySql database, so write records 1 by 1 will take many time.

So i should merge all records before to push it on MySql.

I am trying to adapt your job on Stackoverflow to solve my problem.

Any help will be usefull

Thanks,

TRF · ‎2017-09-06

Simple solutions are often the best, and here the it seems to be tAggregateRow.

tFixedFlowInput with the data sample and the schema:

In tAggregateRow, I use "first" function with "ignore null values" option:

Here is the result:

Starting job test at 22:40 06/09/2017.

[statistics] connecting to socket on port 3413
[statistics] connected
100|1000|a
200|5000|a
300|6000|c
400|7000|a
[statistics] disconnected
Job test ended at 22:40 06/09/2017. [exit code=0]

I think that's what you need.

idembel2 · ‎2017-09-07

Morning TRF,

It is exactly what i expect, thank you very much for your help.

I know your suggestion is the solution, but i try to test it with an exemple of input file in my Job and i don't get what i expect.

I attach my input file into this post, and bellow the screenshot of my job.

I show you also config of tAggregateRow component with the screenshot bellow.

I active "ignore null value" option for all fields without the 6 first one.

Thank you for help

orga_britton.csv

TRF · ‎2017-09-07

Here is the result I get.

Seems OK from my point of view.

Job attached with the complete schema.

Edit: + the result

orga_britton.zip
orga_britton_result.csv

TRF · ‎2017-09-08

@mbocquet let me know if the result is ok for.

If it is, don't forget to mark the topic as solved (Kudo also accepted).

idembel2 · ‎2017-09-09

Hello TRF,

Since my last post, i didn't get time to test your job. I do it tomorrow and let you know.

Thank you very much for help.

Merge columns of a file

Talend Data Integration

v6.x

Merge columns of a file

Talend Data Integration

v6.x

Related Topics