Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
MDVV
Contributor II
Contributor II

Iterate over multiple files and create output by file name

Hello Team,

I hope you can help me with the following problem in talend studio:

I have several files in a directory, and I need to join them by file name. This will generate an output containing the iterated file names.

Example:

1-1000739.out
2-1000739.out
3-1000739.out
4-1000739.out
I need a single file to be created with the union of the 4 files and with the name: Output_1000739.json
and so on with the rest of the files found in the directory:

1-1000687.out
2-1000687.out
3-1000687.out
4-1000687.out

Output_1000687.json

I have a Subjob:

t_Filelits -->Iterate-->tFileInputRow-->tFileOutputDelimited(check Append option)

But it joins all the files into one. 
Any ideas to solve this?

 

 

Thanks in advance!

Labels (9)
1 Solution

Accepted Solutions
quentin-vigne
Partner - Creator II
Partner - Creator II

Hi @MDVV 

I managed to do it with a change : In my first post I forgot to use the tIterateToFlow.

quentinvigne_0-1751530124235.png

First of all :

  1. tFileList_1 : add a file mask with "*.out". This way this component will list only desired files in your directory
  2. tIterateToFlow_1 : This way you will transform the list of files in your directory to a simple row. Don't forget to add a column "CURRENT_FILE" for example with a value of 
((String)globalMap.get("tFileList_1_CURRENT_FILE"))​

 

  • Then you cann the tJavaRow I was talking about with this code : 
String fileName = input_row.CURRENT_FILE; // e.g., 1-1000739.out
String[] parts = fileName.split("-");
String idPart = parts[1].replace(".out", ""); // 1000739
output_row.identifier = idPart;​

Edit the schema and add to the output a column named "identifier". You can also change it later if you want but you will have to change the outpur_row.identifier in the code.

  • Then add the tUniqRow and check the "Key attribute" for identifier column.
  • Add a tFlowToIterate to start an iteration over the uniques values of your files
  • Add a tFileList with a file mask having this value : 
"*" + ((String)globalMap.get("row3.identifier")) + ".out"​

with this we list every file that are like "*1000739.out" for example

  • Add your tFileInputRaw and fill in filename with the global variable : 
((String)globalMap.get("tFileList_2_CURRENT_FILEPATH"))
  • Now you can finally add the tFileOutputDelimited with the output filename being 
"C:/TalendWorkspace/test/Output_" + ((String)globalMap.get("row3.identifier")) + ".json"​

 

I used some test file and it works on my end.

 

Have a good day

- Quentin

View solution in original post

4 Replies
quentin-vigne
Partner - Creator II
Partner - Creator II

Hello @MDVV 

You have the logic but what is missing is first a way of iterating only for a given "identifier" like "1000739" or "1000687"

 

What I would do is first : setup a tFileList to list all files that are like "*.out"

Then you can use a tJavaRow do apply this transformation and keep all the "identifier" in memory :

String fileName = input_row.currentFileName; // e.g., 1-1000739.out
String[] parts = fileName.split("-");
String idPart = parts[1].replace(".out", ""); // 1000739
output_row.filePath = input_row.currentFilePath;
output_row.identifier = idPart;

 

After this, setup a tUniqRow to keep only 1 occurence of each identifier and then add a tFlowToIterate (this way you can iterate over the uniques rows) then you can do your current flow :  tFileList (add a filemask to use the current identifier from the tFlowToIterate) -->Iterate-->tFileInputRow-->tFileOutputDelimited (check Append option)

 

I hope this is clear enough 

 

- Quentin

MDVV
Contributor II
Contributor II
Author

Hello, thanks for your response @quentin-vigne  

 I tried two options following your recommendation.

Option 1. With tJavaRow, which cannot be connected to a tFilelist, so I connected it to tFileInputRow (my files are not formatted). The tJavaRow code is as follows:

String fileName = ((String)globalMap.get("tFileList_6_CURRENT_FILE"));

String[] parts = fileName.split("-");

String idPart = parts[1].replace(".out", ""); // 1000739

output_row.content = ((String)globalMap.get("tFileList_6_CURRENT_FILEPATH"));

output_row.content = idPart;

 

and the subjob is as follows:

MDVV_0-1751487702513.png

The file returns the append of all the source files (.out).

Option 2. With tjava, which can be connected or not to a previous transformation.

The tjava code is as follows:

String fileName = ((String)globalMap.get("tFileList_9_CURRENT_FILE")); // e.g., 1-1000739.out

String[] parts = fileName.split("-");

String idPart = parts[1].replace(".out", ""); // 1000739

String filePath = ((String)globalMap.get("tFileList_9_CURRENT_FILEPATH"));

String identifier = idPart;

 

and the subjob is as follows:

MDVV_1-1751488277704.png

 

With this option, it also returns a single output file with the append of all the source files.
I've been trying to resolve this issue for weeks. I'm new to Talend, and it's been very complicated.

 

 

quentin-vigne
Partner - Creator II
Partner - Creator II

Hi @MDVV 

I managed to do it with a change : In my first post I forgot to use the tIterateToFlow.

quentinvigne_0-1751530124235.png

First of all :

  1. tFileList_1 : add a file mask with "*.out". This way this component will list only desired files in your directory
  2. tIterateToFlow_1 : This way you will transform the list of files in your directory to a simple row. Don't forget to add a column "CURRENT_FILE" for example with a value of 
((String)globalMap.get("tFileList_1_CURRENT_FILE"))​

 

  • Then you cann the tJavaRow I was talking about with this code : 
String fileName = input_row.CURRENT_FILE; // e.g., 1-1000739.out
String[] parts = fileName.split("-");
String idPart = parts[1].replace(".out", ""); // 1000739
output_row.identifier = idPart;​

Edit the schema and add to the output a column named "identifier". You can also change it later if you want but you will have to change the outpur_row.identifier in the code.

  • Then add the tUniqRow and check the "Key attribute" for identifier column.
  • Add a tFlowToIterate to start an iteration over the uniques values of your files
  • Add a tFileList with a file mask having this value : 
"*" + ((String)globalMap.get("row3.identifier")) + ".out"​

with this we list every file that are like "*1000739.out" for example

  • Add your tFileInputRaw and fill in filename with the global variable : 
((String)globalMap.get("tFileList_2_CURRENT_FILEPATH"))
  • Now you can finally add the tFileOutputDelimited with the output filename being 
"C:/TalendWorkspace/test/Output_" + ((String)globalMap.get("row3.identifier")) + ".json"​

 

I used some test file and it works on my end.

 

Have a good day

- Quentin

MDVV
Contributor II
Contributor II
Author

Your recommendation worked!

Thank you so much for sharing your knowledge! Thanks in advance!