Solved: [resolved] tfilelist, check header row and match u... - Qlik Community

Anonymous · ‎2011-05-06

Hello,
I have a directory (unix) which has a number of files. I can extract all, for example, "filetype1.csv" using regex.
However some of these files of the same type have different headers and different numbers of columns which don't always match up or come in the same order.
I'm currently just going from tfilelist to a delimited file input. I would like to either
a) split it coming out of the directory based on what the header is (i.e if header matches regex1 then put into inputfile1, if header matches regex2 put into inputfile2), so the columns all match up in the merged file. Or...
b) just extract certain columns from the files before putting them altogether in the delimited input file, so I again the combination of all the inputs match up in terms of column headings.
Is there anyway to do this without writing custom java code to do it all?
Thanks
P

Anonymous · ‎2011-05-06

for (a) there is a non-custom solution:
1) read the header from each file (in the schema for your input, have a single column to hold the whole line)
2) use a tMap to run your regex in the tMap output filter-- you will have one output table per target file. (this part would be cleaner with a tJavaRow)
3) using an "if" link, read the input file with the correct input component.
if you need more details, please ask... I can work up an example

View solution in original post

Anonymous · ‎2011-05-06

for (a) there is a non-custom solution:
1) read the header from each file (in the schema for your input, have a single column to hold the whole line)
2) use a tMap to run your regex in the tMap output filter-- you will have one output table per target file. (this part would be cleaner with a tJavaRow)
3) using an "if" link, read the input file with the correct input component.
if you need more details, please ask... I can work up an example

Anonymous · ‎2011-05-09

Hi,
thanks very much for your message, sounds like a sensible solution. I think I have an idea of how to do the below... but if you could give me an example that would be really great.
I've done a fair bit of java, so am quite happy to write some custom code (using tJavaRow instead of tmap if it makes more sense as you suggest)... it's just that i've never done java in Talend and am not quite sure how to start without some example code before.
Thank you very much
P

Anonymous · ‎2011-05-09

Just a note before you implement: I forgot that the tFileInputMSDelimited may make this much simpler. It is designed to work with single multischema files, but it may work for this problem. If it does, it would be as simple as:
tFileList
|
tFileInputMSDelimited--file1-->(rest of job for file 1)
|--file2-->(rest of job for file 2)

Here's the original solution I envisioned.
tFileList
|
iterate
|
tFileInputDelimited-row->tJavaRow--if-->tFileInputDelimited --> (rest of job for file 1)
|--if-->tFileInputDelimited --> (rest of job for file 2)

in the first tFileInputDelimited, set it up to read one row into a single column. (by setting the limit to 1 and the field separator to "")
in the tJavaRow, set a context variable to the name of the file you want to run based on your regex logic.
i.e.

if( input_row.header_line.matches("some crazy regex" )  ) {
     context.file_to_run = "file_1";
}

in the if links, check this variable to execute the correct file processing flow. i.e. :

context.file_to_run.equals("file_1")

Anonymous · ‎2011-05-13

Hi John,
Thanks for your help,
P

Anonymous · ‎2019-07-08

I want to ask a question in continuation to this.

What filename do we select in FileInputDelimited component after the if statement ?

[resolved] tfilelist, check header row and match up columns

Java

Other

Talend Data Integration