Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Team ,
I have a folder with 150 files , it is a mix of different schemas , 2 files will have one schema and another 8 files will have one schema ,and another 1 file will have one schema. to be very clear i have attached a screenshot
i need to merge TalendSHPR10WarningsHitosvsSuelosSemanal_0 and
TalendSHPR10WarningsHitosvsSuelosSemanal_1 and finally generate a file TalendSHPR10WarningsHitosvsSuelosSemanal.csv
similiar
i need to merge
TalendOREPRO25BalanceSPCarteras_0
TalendOREPRO25BalanceSPCarteras_1
TalendOREPRO25BalanceSPCarteras_2
TalendOREPRO25BalanceSPCarteras_3
TalendOREPRO25BalanceSPCarteras_4
TalendOREPRO25BalanceSPCarteras_5
TalendOREPRO25BalanceSPCarteras_6
TalendOREPRO25BalanceSPCarteras_7
TalendOREPRO25BalanceSPCarteras_8
and i need to generate a file
i need to merge TalendOREPRO25BalanceSPCarteras.csv
and _0 are invidivual file not to be merge , it has to e generate has "filename".csv
I'm planning to use tfileList , how can i achieve it
Thanks in advance 🙂
Regards,
Vinoth Kumar K.
Hi @Vinoth Kumar K
Assuming you have a key column in all those files that could be used for joining them, you can map their schemas to a set of tFileInput* components, connect them to a tMap and configure the join there.
Then you can use a set of tFileList components, configure its file name pattern according to each schema and connect them to the corresponding tFileInput* using the Iterate connection.
By doing this, each lookup input will load into memory all the files matching the name pattern before the main input runs, so when the main input starts it will actually lookup over all the lookup files, not only one of the matching files.
However, please keep in mind that this approach might consume a high amount of memory depending on the size and number of files listed on each lookup. One way to optimize this a little bit is keeping the file with larger schema on the main input. Another way is to break this process in smaller ones.
Another thing to consider is that your job design might get poluted if there are many lookups. To avoid this you might consider listing and loading the lookup files to memory first using tHash components.