Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
MDVV
Contributor II
Contributor II

Iterate over multiple files and create output by file name

Hello Team,

I hope you can help me with the following problem in talend studio:

I have several files in a directory, and I need to join them by file name. This will generate an output containing the iterated file names.

Example:

1-1000739.out
2-1000739.out
3-1000739.out
4-1000739.out
I need a single file to be created with the union of the 4 files and with the name: Output_1000739.json
and so on with the rest of the files found in the directory:

1-1000687.out
2-1000687.out
3-1000687.out
4-1000687.out

Output_1000687.json

I have a Subjob:

t_Filelits -->Iterate-->tFileInputRow-->tFileOutputDelimited(check Append option)

But it joins all the files into one. 
Any ideas to solve this?

 

 

Thanks in advance!

Labels (9)
1 Solution

Accepted Solutions
quentin-vigne
Partner - Creator II
Partner - Creator II

Hi @MDVV 

I managed to do it with a change : In my first post I forgot to use the tIterateToFlow.

quentinvigne_0-1751530124235.png

First of all :

  1. tFileList_1 : add a file mask with "*.out". This way this component will list only desired files in your directory
  2. tIterateToFlow_1 : This way you will transform the list of files in your directory to a simple row. Don't forget to add a column "CURRENT_FILE" for example with a value of 
((String)globalMap.get("tFileList_1_CURRENT_FILE"))​

 

  • Then you cann the tJavaRow I was talking about with this code : 
String fileName = input_row.CURRENT_FILE; // e.g., 1-1000739.out
String[] parts = fileName.split("-");
String idPart = parts[1].replace(".out", ""); // 1000739
output_row.identifier = idPart;​

Edit the schema and add to the output a column named "identifier". You can also change it later if you want but you will have to change the outpur_row.identifier in the code.

  • Then add the tUniqRow and check the "Key attribute" for identifier column.
  • Add a tFlowToIterate to start an iteration over the uniques values of your files
  • Add a tFileList with a file mask having this value : 
"*" + ((String)globalMap.get("row3.identifier")) + ".out"​

with this we list every file that are like "*1000739.out" for example

  • Add your tFileInputRaw and fill in filename with the global variable : 
((String)globalMap.get("tFileList_2_CURRENT_FILEPATH"))
  • Now you can finally add the tFileOutputDelimited with the output filename being 
"C:/TalendWorkspace/test/Output_" + ((String)globalMap.get("row3.identifier")) + ".json"​

 

I used some test file and it works on my end.

 

Have a good day

- Quentin

View solution in original post

4 Replies
quentin-vigne
Partner - Creator II
Partner - Creator II

Hello @MDVV 

You have the logic but what is missing is first a way of iterating only for a given "identifier" like "1000739" or "1000687"

 

What I would do is first : setup a tFileList to list all files that are like "*.out"

Then you can use a tJavaRow do apply this transformation and keep all the "identifier" in memory :

String fileName = input_row.currentFileName; // e.g., 1-1000739.out
String[] parts = fileName.split("-");
String idPart = parts[1].replace(".out", ""); // 1000739
output_row.filePath = input_row.currentFilePath;
output_row.identifier = idPart;

 

After this, setup a tUniqRow to keep only 1 occurence of each identifier and then add a tFlowToIterate (this way you can iterate over the uniques rows) then you can do your current flow :  tFileList (add a filemask to use the current identifier from the tFlowToIterate) -->Iterate-->tFileInputRow-->tFileOutputDelimited (check Append option)

 

I hope this is clear enough 

 

- Quentin

MDVV
Contributor II
Contributor II
Author

Hello, thanks for your response @quentin-vigne  

 I tried two options following your recommendation.

Option 1. With tJavaRow, which cannot be connected to a tFilelist, so I connected it to tFileInputRow (my files are not formatted). The tJavaRow code is as follows:

String fileName = ((String)globalMap.get("tFileList_6_CURRENT_FILE"));

String[] parts = fileName.split("-");

String idPart = parts[1].replace(".out", ""); // 1000739

output_row.content = ((String)globalMap.get("tFileList_6_CURRENT_FILEPATH"));

output_row.content = idPart;

 

and the subjob is as follows:

MDVV_0-1751487702513.png

The file returns the append of all the source files (.out).

Option 2. With tjava, which can be connected or not to a previous transformation.

The tjava code is as follows:

String fileName = ((String)globalMap.get("tFileList_9_CURRENT_FILE")); // e.g., 1-1000739.out

String[] parts = fileName.split("-");

String idPart = parts[1].replace(".out", ""); // 1000739

String filePath = ((String)globalMap.get("tFileList_9_CURRENT_FILEPATH"));

String identifier = idPart;

 

and the subjob is as follows:

MDVV_1-1751488277704.png

 

With this option, it also returns a single output file with the append of all the source files.
I've been trying to resolve this issue for weeks. I'm new to Talend, and it's been very complicated.

 

 

quentin-vigne
Partner - Creator II
Partner - Creator II

Hi @MDVV 

I managed to do it with a change : In my first post I forgot to use the tIterateToFlow.

quentinvigne_0-1751530124235.png

First of all :

  1. tFileList_1 : add a file mask with "*.out". This way this component will list only desired files in your directory
  2. tIterateToFlow_1 : This way you will transform the list of files in your directory to a simple row. Don't forget to add a column "CURRENT_FILE" for example with a value of 
((String)globalMap.get("tFileList_1_CURRENT_FILE"))​

 

  • Then you cann the tJavaRow I was talking about with this code : 
String fileName = input_row.CURRENT_FILE; // e.g., 1-1000739.out
String[] parts = fileName.split("-");
String idPart = parts[1].replace(".out", ""); // 1000739
output_row.identifier = idPart;​

Edit the schema and add to the output a column named "identifier". You can also change it later if you want but you will have to change the outpur_row.identifier in the code.

  • Then add the tUniqRow and check the "Key attribute" for identifier column.
  • Add a tFlowToIterate to start an iteration over the uniques values of your files
  • Add a tFileList with a file mask having this value : 
"*" + ((String)globalMap.get("row3.identifier")) + ".out"​

with this we list every file that are like "*1000739.out" for example

  • Add your tFileInputRaw and fill in filename with the global variable : 
((String)globalMap.get("tFileList_2_CURRENT_FILEPATH"))
  • Now you can finally add the tFileOutputDelimited with the output filename being 
"C:/TalendWorkspace/test/Output_" + ((String)globalMap.get("row3.identifier")) + ".json"​

 

I used some test file and it works on my end.

 

Have a good day

- Quentin

MDVV
Contributor II
Contributor II
Author

Your recommendation worked!

Thank you so much for sharing your knowledge! Thanks in advance!