Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Bucharest on Sept 18th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

[resolved] tfilelist, check header row and match up columns

Hello,
I have a directory (unix) which has a number of files. I can extract all, for example, "filetype1.csv" using regex.
However some of these files of the same type have different headers and different numbers of columns which don't always match up or come in the same order.
I'm currently just going from tfilelist to a delimited file input. I would like to either
a) split it coming out of the directory based on what the header is (i.e if header matches regex1 then put into inputfile1, if header matches regex2 put into inputfile2), so the columns all match up in the merged file. Or...
b) just extract certain columns from the files before putting them altogether in the delimited input file, so I again the combination of all the inputs match up in terms of column headings.
Is there anyway to do this without writing custom java code to do it all?
Thanks
P
Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

for (a) there is a non-custom solution:
1) read the header from each file (in the schema for your input, have a single column to hold the whole line)
2) use a tMap to run your regex in the tMap output filter-- you will have one output table per target file. (this part would be cleaner with a tJavaRow)
3) using an "if" link, read the input file with the correct input component.
if you need more details, please ask... I can work up an example

View solution in original post

5 Replies
Anonymous
Not applicable
Author

for (a) there is a non-custom solution:
1) read the header from each file (in the schema for your input, have a single column to hold the whole line)
2) use a tMap to run your regex in the tMap output filter-- you will have one output table per target file. (this part would be cleaner with a tJavaRow)
3) using an "if" link, read the input file with the correct input component.
if you need more details, please ask... I can work up an example
Anonymous
Not applicable
Author

Hi,
thanks very much for your message, sounds like a sensible solution. I think I have an idea of how to do the below... but if you could give me an example that would be really great.
I've done a fair bit of java, so am quite happy to write some custom code (using tJavaRow instead of tmap if it makes more sense as you suggest)... it's just that i've never done java in Talend and am not quite sure how to start without some example code before.
Thank you very much
P
Anonymous
Not applicable
Author

Just a note before you implement: I forgot that the tFileInputMSDelimited may make this much simpler. It is designed to work with single multischema files, but it may work for this problem. If it does, it would be as simple as:
tFileList
|
tFileInputMSDelimited--file1-->(rest of job for file 1)
|--file2-->(rest of job for file 2)

Here's the original solution I envisioned.
tFileList
|
iterate
|
tFileInputDelimited-row->tJavaRow--if-->tFileInputDelimited --> (rest of job for file 1)
|--if-->tFileInputDelimited --> (rest of job for file 2)

in the first tFileInputDelimited, set it up to read one row into a single column. (by setting the limit to 1 and the field separator to "")
in the tJavaRow, set a context variable to the name of the file you want to run based on your regex logic.
i.e.
if( input_row.header_line.matches("some crazy regex" )  ) {
context.file_to_run = "file_1";
}

in the if links, check this variable to execute the correct file processing flow. i.e. :
context.file_to_run.equals("file_1")
Anonymous
Not applicable
Author

Hi John,
Thanks for your help,
P
Anonymous
Not applicable
Author

I want to ask a question in continuation to this.

What filename  do we select in FileInputDelimited component after the if statement ?