Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
kvinoth19991
Contributor II
Contributor II

Merging files from tfilelist with different set schema (generate a new set of file)

Hi Team ,

I have a folder with 150 files , it is a mix of different schemas , 2 files will have one schema and another 8 files will have one schema ,and another 1 file will have one schema. to be very clear i have attached a screenshot

0695b00000huycOAAQ.png

i need to merge TalendSHPR10WarningsHitosvsSuelosSemanal_0 and

TalendSHPR10WarningsHitosvsSuelosSemanal_1 and finally generate a file TalendSHPR10WarningsHitosvsSuelosSemanal.csv

similiar

i need to merge

TalendOREPRO25BalanceSPCarteras_0

TalendOREPRO25BalanceSPCarteras_1

TalendOREPRO25BalanceSPCarteras_2

TalendOREPRO25BalanceSPCarteras_3

TalendOREPRO25BalanceSPCarteras_4

TalendOREPRO25BalanceSPCarteras_5

TalendOREPRO25BalanceSPCarteras_6

TalendOREPRO25BalanceSPCarteras_7

TalendOREPRO25BalanceSPCarteras_8

and i need to generate a file

i need to merge TalendOREPRO25BalanceSPCarteras.csv

and _0 are invidivual file not to be merge , it has to e generate has "filename".csv

I'm planning to use tfileList , how can i achieve it

Thanks in advance 🙂

Regards,

Vinoth Kumar K.

Labels (3)
1 Reply
anselmopeixoto
Partner - Creator III
Partner - Creator III

Hi @Vinoth Kumar K​ 

 

Assuming you have a key column in all those files that could be used for joining them, you can map their schemas to a set of tFileInput* components, connect them to a tMap and configure the join there.

 

Then you can use a set of tFileList components, configure its file name pattern according to each schema and connect them to the corresponding tFileInput* using the Iterate connection.

 

By doing this, each lookup input will load into memory all the files matching the name pattern before the main input runs, so when the main input starts it will actually lookup over all the lookup files, not only one of the matching files.

 

However, please keep in mind that this approach might consume a high amount of memory depending on the size and number of files listed on each lookup. One way to optimize this a little bit is keeping the file with larger schema on the main input. Another way is to break this process in smaller ones.

 

Another thing to consider is that your job design might get poluted if there are many lookups. To avoid this you might consider listing and loading the lookup files to memory first using tHash components.