Solved: group records and create file for each record - Qlik Community

Anonymous · ‎2018-07-30

Hi,

I currently have a Excel macro enables workbook which reads the data in file then groups the data based on certain criteria, say for example Name and timesheet data. It then creates an individual workbook for each of these groups and names the workbook based on the name and the timesheet value column. It sorts the data first then inserts new line after each group, then copies this data and paste into another workbook, save and closes the file (current workbook), now read the next set of data in the original file and copies and paste the next set of group data and puts in new book and so on utill it reaches end of file with no data and closes the application.

I have been asked to rewrite this job in Talend as the timesheet data we longer receive in excel. The timesheet data comes in a pipe text file. I have been able to create a excel file from the pipe text file.

However, the problem I am having now is I need to read the output file which has been produced from the pipe text file and group this data and create a individual file for each set of group data. I have searched and cannot find any that meet my requirements. Please see screen shots. I will be ignoring Column 0 = "I". Lucky the data is already sorted for me this time, however might not be next time.

Now I want to group the data by the name and date, and produce individual output (excel file) where it will need to name these file based on certain column names.

I am using talend 6.5 therefore do not have tMatchgroup and cannot use tXLMAP as this requires joins. I just want the job extract the data from the pipe file then convert the file, then group the data and produce individual files using one single output component as I do not know how many rows the text file would have. I thought I had found a way but the final output file is blank.

Please any help would be appreciated.

Anonymous · ‎2018-07-30

Hi,

Attached job will help you to parse the data. You will have to change the formatting of the date according to your need.

The job will identify all the distinct values of the file and store them in a hash value.

Then based on this control values from hash input, the file will be read multiple times to fetch the necessary data.

One the data is separated, they will be moved to multiple directories based on column 3 and the file name will be concatenation of column 2 and column 15 (screenshots below). You can change the file and directory structure according to your needs.

The output of file B is as shown below.

Note:- 1) Since multiple reads will be happening on same input file, test the performance if you are planning for huge data volumes and make necessary changes.

2) If you want to process multiple input files, add a tfilelist at the beginning of first subjob.

If the idea has helped you, could you please mark the topic as solution provided.It will help in enriching the Talend community.

Warm Regards,

Nikhil Thampi

invoice_input.xlsx
invoice_reader.zip

View solution in original post

Anonymous · ‎2018-07-30

Hi,

Could you please share some sample input data for analysis and the expected output file format?

There are lo of solutions we can do but let us try to do solutioning based on current data structure.

Warm Regards,

Nikhil Thampi

Anonymous · ‎2018-07-30

@nthampi thank you for your quick response, much appreciated. I did add screen shot but seems like it did not attach.

So Once the pipe text file is converted to excel, data should like this. Currently already grouped. I this data is read, it needs to ignore column 0= "I"

Column0

Column1

Column2

Column3

Column4

Column5

Column6

Column7

Column8

Column9

Column10

Column11

Column12

Column13

Column14

Column15

Column16

I

14001141

To see the whole post, download it here
OriginalPost.pdf

Anonymous · ‎2018-07-30

Hi,

Attached job will help you to parse the data. You will have to change the formatting of the date according to your need.

The job will identify all the distinct values of the file and store them in a hash value.

Then based on this control values from hash input, the file will be read multiple times to fetch the necessary data.

One the data is separated, they will be moved to multiple directories based on column 3 and the file name will be concatenation of column 2 and column 15 (screenshots below). You can change the file and directory structure according to your needs.

The output of file B is as shown below.

Note:- 1) Since multiple reads will be happening on same input file, test the performance if you are planning for huge data volumes and make necessary changes.

2) If you want to process multiple input files, add a tfilelist at the beginning of first subjob.

If the idea has helped you, could you please mark the topic as solution provided.It will help in enriching the Talend community.

Warm Regards,

Nikhil Thampi

invoice_input.xlsx
invoice_reader.zip

Anonymous · ‎2018-07-30

@nthampi Much appreciated. I will have a look at attached and let you know how I get on. Once , again thank you so much.

Anonymous · ‎2018-07-30

@nthampi

Much appreciated. I will have a look at attached and let you know how I get on. Once , again thank you so much.

Anonymous · ‎2018-07-30

I am unable to import the project due to version compatibility. I am working on Talend 6.5, and generally able to import jobs but 6.5 is not allowing me due to the latest version used by yourself.

Anonymous · ‎2018-07-30

Hi,

I created the job in Talend Version 7. Could you please download TOS version 7 for importing the data?

Warm Regards,

Nikhil Thampi

Anonymous · ‎2018-08-01

@nthampi

Have used talend Open Studio big data 7. the the job runs with little changes made to it however nothing is created in the output file for each person, also it is created each line rather than grouping the data in one sheet. So instead of having say 10 files, I have 314 files. The last tmap is showing 0 rows. Also the file name is not showing the correct columns, it showing the firstname the date and rate, rather than, firstname, surnanme and date.

FinalTMap.PNG
EmptyFile.PNG
FilesCreated.PNG
OverallJob.PNG

Anonymous · ‎2018-08-01

Hi,

Between first subjob and second subjob, you are using On component ok instead of ideal way of doing (ie On Subjob OK).

Also it seems your aggregation component has not configured correctly. Please refer the columns I have used for aggregation ( I have used the User name, Type and Date for the grouping).

In my job, the input data got grouped to 2 after aggregation layer but it seems your grouping has resulted in almost same values.

A good idea to debug the job is always to see output using a tlogrow to make sure that your getting the expected values after adding the component.

Please use my job as a reference point but always make necessary changes according to your exact project requirement.

Warm Regards,

Nikhil Thampi

group records and create file for each record

Other

Talend Data Integration

v6.x