Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello Team,
I hope you can help me with the following problem in talend studio:
I have several files in a directory, and I need to join them by file name. This will generate an output containing the iterated file names.
Example:
1-1000739.out
2-1000739.out
3-1000739.out
4-1000739.out
I need a single file to be created with the union of the 4 files and with the name: Output_1000739.json
and so on with the rest of the files found in the directory:
1-1000687.out
2-1000687.out
3-1000687.out
4-1000687.out
Output_1000687.json
I have a Subjob:
t_Filelits -->Iterate-->tFileInputRow-->tFileOutputDelimited(check Append option)
But it joins all the files into one.
Any ideas to solve this?
Thanks in advance!
Hi @MDVV
I managed to do it with a change : In my first post I forgot to use the tIterateToFlow.
First of all :
((String)globalMap.get("tFileList_1_CURRENT_FILE"))
String fileName = input_row.CURRENT_FILE; // e.g., 1-1000739.out
String[] parts = fileName.split("-");
String idPart = parts[1].replace(".out", ""); // 1000739
output_row.identifier = idPart;
Edit the schema and add to the output a column named "identifier". You can also change it later if you want but you will have to change the outpur_row.identifier in the code.
"*" + ((String)globalMap.get("row3.identifier")) + ".out"
with this we list every file that are like "*1000739.out" for example
((String)globalMap.get("tFileList_2_CURRENT_FILEPATH"))
"C:/TalendWorkspace/test/Output_" + ((String)globalMap.get("row3.identifier")) + ".json"
I used some test file and it works on my end.
Have a good day
- Quentin
Hello @MDVV
You have the logic but what is missing is first a way of iterating only for a given "identifier" like "1000739" or "1000687"
What I would do is first : setup a tFileList to list all files that are like "*.out"
Then you can use a tJavaRow do apply this transformation and keep all the "identifier" in memory :
String fileName = input_row.currentFileName; // e.g., 1-1000739.out
String[] parts = fileName.split("-");
String idPart = parts[1].replace(".out", ""); // 1000739
output_row.filePath = input_row.currentFilePath;
output_row.identifier = idPart;
After this, setup a tUniqRow to keep only 1 occurence of each identifier and then add a tFlowToIterate (this way you can iterate over the uniques rows) then you can do your current flow : tFileList (add a filemask to use the current identifier from the tFlowToIterate) -->Iterate-->tFileInputRow-->tFileOutputDelimited (check Append option)
I hope this is clear enough
- Quentin
Hello, thanks for your response @quentin-vigne
I tried two options following your recommendation.
Option 1. With tJavaRow, which cannot be connected to a tFilelist, so I connected it to tFileInputRow (my files are not formatted). The tJavaRow code is as follows:
String fileName = ((String)globalMap.get("tFileList_6_CURRENT_FILE"));
String[] parts = fileName.split("-");
String idPart = parts[1].replace(".out", ""); // 1000739
output_row.content = ((String)globalMap.get("tFileList_6_CURRENT_FILEPATH"));
output_row.content = idPart;
and the subjob is as follows:
The file returns the append of all the source files (.out).
Option 2. With tjava, which can be connected or not to a previous transformation.
The tjava code is as follows:
String fileName = ((String)globalMap.get("tFileList_9_CURRENT_FILE")); // e.g., 1-1000739.out
String[] parts = fileName.split("-");
String idPart = parts[1].replace(".out", ""); // 1000739
String filePath = ((String)globalMap.get("tFileList_9_CURRENT_FILEPATH"));
String identifier = idPart;
and the subjob is as follows:
With this option, it also returns a single output file with the append of all the source files.
I've been trying to resolve this issue for weeks. I'm new to Talend, and it's been very complicated.
Hi @MDVV
I managed to do it with a change : In my first post I forgot to use the tIterateToFlow.
First of all :
((String)globalMap.get("tFileList_1_CURRENT_FILE"))
String fileName = input_row.CURRENT_FILE; // e.g., 1-1000739.out
String[] parts = fileName.split("-");
String idPart = parts[1].replace(".out", ""); // 1000739
output_row.identifier = idPart;
Edit the schema and add to the output a column named "identifier". You can also change it later if you want but you will have to change the outpur_row.identifier in the code.
"*" + ((String)globalMap.get("row3.identifier")) + ".out"
with this we list every file that are like "*1000739.out" for example
((String)globalMap.get("tFileList_2_CURRENT_FILEPATH"))
"C:/TalendWorkspace/test/Output_" + ((String)globalMap.get("row3.identifier")) + ".json"
I used some test file and it works on my end.
Have a good day
- Quentin
Your recommendation worked!
Thank you so much for sharing your knowledge! Thanks in advance!