Skip to main content
Announcements
Introducing a new Enhanced File Management feature in Qlik Cloud! GET THE DETAILS!
cancel
Showing results for 
Search instead for 
Did you mean: 
McJingles
Contributor III
Contributor III

Finding field separator before processing the file

My folder contains numerous files (.txt format) for which the field separator for few of the files is comma(,), few of the files is pipe(|), few of the files is semicolon(0683p000009MA9p.png and so on.

 

Is there any option to extract the columns in the same job?

Thanks in advance 0683p000009MAB6.png 

Labels (2)
5 Replies
Anonymous
Not applicable

Yes, what you need to do is use the tFileInputDelimited and set the "Field Separator" field to a context variable. Then you can set the value of the context variable at the beginning of the job or between processing files. This can be done. dynamically, but it may require a tiny bit of code.

McJingles
Contributor III
Contributor III
Author

Thanks for the reply @rhall 

 

Can you please elaborate this further more?

I am new to Talend. Can you please share the code or any link to what you discussed?

 

Anonymous
Not applicable

OK, in this example I have created a context variable called "sep". This can be seen here....

0683p000009M93Q.png

 

I have given it a value of ";" for this example. But context variable values can be set dynamically as well. You can assign the values in numerous ways. This is covered by other questions on the Community.

 

After doing this, I configured my tFileInputDelimited component as below....

0683p000009M93V.png

Notice the "Field Separator" field is populated by ....

 

context.sep

This tells the component to use the value held by context.sep as the separator.

McJingles
Contributor III
Contributor III
Author

Good one @rhall 

 

0683p000009M93a.png

 

I've nearly 20 files in the folder which is defined in TFileList component. I would take all the files as input.

 

That input files looks like 

 

In this scenario, How can i extract the data using the delimiter Pipe(|) and Comma(,) in same job.?

Moreover, I don't know Which delimiter is present in that all the input files but Pipe, comma, semi-colon will be there.

Anonymous
Not applicable

I can't tell you exactly how to do this without doing it myself. It doesn't really serve you to do this myself as you will not learn. However, I am happy to give you my considerations for a problem like this.

 

1) You know the separator types that it could be beforehand

2) With every file from the tFileList, you have the opportunity to pre check the first row of the file

3) You know how many columns there are in each file (otherwise this model for processing files will not work)

 

As such you have have all of the clues you need to identify which of the 3 separators that it could be. For each iteration of the tFileList you do not need to process the data in one subjob. You could load the the tFileList data, check the first row of the file, identify the separator (look at the split() function), save the details in a tHashOutput, then use a tHashInput to read the data in, iterate over it and read the file with the correct separator.