Reading a directory of csv file with different schema
I am trying to read list of files from ftp. Each file is csv but of different schema. How do I achieve this? Currenlty I have:: tftpfilelist-->onsubjobok-->tfileinputdelimited I don't have fixed schema for tfileinputdelimited.
Hi, I am also had same scenario, i cracked it by some other way - I have converted all the csv to xlsx - tfilefetch to read the xlsx file from directory - Iterate each file to tFileExcellworkbookopen component - then define the schema what you are looking for using tFileExcelSheetInput component. Which will allow dynamically map the columns, then you can do the transformations as you like.
@jlolling I have no predefined schema. I need to load the csv file as is. There are header which will be column labels. But again there is field called Field Extraction in tFileInputTextFlat. I don't know column name.
@rathinasamyy I can't manually convert csv to excel as it's system generated.(supports only csv)
Jan - thanks for pointing this out... Great component!
Sugandha - you might want to try it out. In the documentation (Jan's
), it states "It is not necessary to map all fields in the file to a schema column" and "For delimited fields, the position of the field can be automatically configured with the content of a header line"
@Sugandha: The only way to work without any predefined schema is in the Enterprise Release and its called Dynamic Schema. This feature indeed allows you to avoid any knowledge about the schema of the file but if you want to write it in a table - at least now you have to know a schema.
You could also decide to write the whole line of the file in a CLOB column of the database but where is the use case to handle this data?
The way we have described at the moment depends on a predefined schema in your job. You have to know what values from which columns you need.
The mentioned component tFileInputTextFlat allows you to match the columns of your schema (the schema of your flow in the job) by regularly expressions with the columns in the header line.
since I don't have enterprise version I have made the schema unique. What I have done is:
tftplist->tftpget->tfilelist->tcopy
but I would like to do:
tftplist->tftpget->tinputdeliminated->tpostgresinput
i.e write the input to postgres database.
How do i achieve this?
Hi Sugandha, If you have multiple files and fixed schema, then you can control the flow by analysing the files and based on that analysis use control flow to redirect incoming file to output. Thanks Vaibhav