Hi, I have 100 txt-files. Within these files, there are some rows I have to extract and copy to a csv. So the process should look like this pseudo-code: FOR all files in a given folder (regardless of their name) OPEN file_i COPY rows 1,4,8 INSERT values into csv column 1,2,3 CLOSE file_i NEXT Is this possible with Talend Open Studio? Thanks!
As it is always interesting to see how a problem is solved, here is my complete project by components:
tFileList (exact name: tFileList_1): Directory chosen, all other settings are standard (right-click on the component to add an iterate row to connect with the next component)
tFileInputFullRow: Add ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")) as Filename, Schema is only one column
tSampleRow: Extracts the rows needed in the further process
tDenormalize: Add the column to denormalize
tReplace (optional): I had to remove some strings to get only the data I needed
tFileOutputDelimited: Save as csv and activate append to have all in one file
That's it. If there is any other way out there that is more easy/flexible or whatsoever, just let me know.
Cheers!
Yes, it is. tFileList --iterate-- tFileInputDelimited ----> Tmap -----> tFileOutputDelimited In the tMap, you will choose your columns 1, 4, 8 to insert to your csv file.
Ok, in the meantime I made some tests. When using your suggestion, I discover the following:
***solved*** tFileList: I'm not able to this component to any Input component e.g. FileDelimited, who does this work?
tFileInputDelimited: I used tFileInputFullRow, does that make any difference?
tMap: no matter what Input component I use, I only have one column and can't access rows directly (or am I missing the point?).
Thanks!
So after doing some more testing I have the process set up, but the only thing that is not working is the tMap component. Problem is, that I have only 1 column imported and do not know how to access a given row within that column to map this to a column in my csv? Maybe it helps to understand that my input file is pure text (html) and not comma separated.
EDIT: using the tDenormalize component, I was able to solve this.
As it is always interesting to see how a problem is solved, here is my complete project by components:
tFileList (exact name: tFileList_1): Directory chosen, all other settings are standard (right-click on the component to add an iterate row to connect with the next component)
tFileInputFullRow: Add ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")) as Filename, Schema is only one column
tSampleRow: Extracts the rows needed in the further process
tDenormalize: Add the column to denormalize
tReplace (optional): I had to remove some strings to get only the data I needed
tFileOutputDelimited: Save as csv and activate append to have all in one file
That's it. If there is any other way out there that is more easy/flexible or whatsoever, just let me know.
Cheers!