Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Talend Cloud AWS EU Scheduled Outage: Starting Tues 26 May 21:00 CEST with expected completion Wed 27 May 01:00 CEST
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Variable Number of Delimited Fields

Hi,
I need help with a task in Talend.
I have a delimeted tab file, but the number of columns is variable in each row.
Let me sample it:
john 23 productx productx add-info
jack 25 productx add-info
july 33 productx productx productx productx add-info
Theres no reserved space for products, and I need to get the add-info after.
Example (output that I need to generate):
john 23 productx productx (blank) (blank) add-info
jack 25 productx (blank) (blank) (blank) add-info
july 33 productx productx productx productx add-info
Don't know if I made myself clear enough.... But thanks for any help.
Labels (2)
9 Replies
Anonymous
Not applicable
Author

Hope someone from Team Talend take a look at your case 🙂 it's kinda pretty hard :-)
These are the things that make the input file complicated:
1. the unknown number of productx
2. there is no identifier or key to each column(this will help us to determine the header value)
3. the "add-info" has no permament place.
4. it is not typical input flat file :-)
But still, your case is very much interesting 🙂
alevy
Specialist
Specialist

If you read across each row, how do you tell which value is a "productx" and which is "add-info". Is it just that there is always exactly one "add-info" as the last value in a row?
Anonymous
Not applicable
Author

Hi alevy,
It seems like, the add-info always at the end, assuming yes, how can this be done?
alevy
Specialist
Specialist

Well, assuming:
-- we know only that there is always exactly one "add-info" as the last value in a row
-- we do not know the maximum number of "productx" there can be on any row
-- the output is also to a delimited file
-- the "add-info" must remain the last value in the row
Then we need to first read the file to find the maximum number of "productx" across all rows. Use tFileInputFullRow and send to tMap. There define a new field ProductCount = StringHandling.COUNT(row1.line,"\t")-2. The output of tMap goes to tAggregateRow, which calculates the max of all ProductCounts. The output of tAggregateRow goes to tSetGlobalVar.
Then link the first tFileInputFullRow to another identical tFileInputFullRow using OnSubjobOK. The flow from the second tFileInputFullRow goes to tJavaRow, which contains the following code:
Integer LastDelimiter = input_row.line.lastIndexOf('\t');
output_row.line = input_row.line.substring(0,LastDelimiter)
+StringHandling.STR('\t',(Integer)globalMap.get("MaxProductCount")-StringHandling.COUNT(input_row.line,"\t")+2)
+input_row.line.substring(LastDelimiter);

The flow from tJavaRow should be what you need to write to tFileOutputDelimited.
Anonymous
Not applicable
Author

Hi alevy,
Confirmed, it works! 0683p000009MACn.png (assuming it is delimited) You are great 0683p000009MA9p.png
Anonymous
Not applicable
Author

Thanks alevy and lovely, will try this solution asap.
Anonymous
Not applicable
Author

Hi, sorry for taking so long to do the test, but I was really busy.
I tried your solution alevy, but is just outputted the same file that was inputted. Did I miss something?
Thanks.
alevy
Specialist
Specialist

I'd have to say: probably 0683p000009MACn.png. Did you set MaxProductCount correctly in tSetGlobalVar? Add a tLogRow after tAggregateRow and a tJava with the following code after tSetGlobalVar to test that the max has been correctly stored.
System.out.println((Integer)globalMap.get("MaxProductCount"));

If both print the same result then put up screenprints of your job.
Anonymous
Not applicable
Author

Geeez it's so nice to be back in Talend ( got hooked with Silverlight project recently and it's driving me nuts!!!) 0683p000009MACn.png
MaxProductCount -> This did the trick, right Alevy? 0683p000009MACn.png You must know the maximum count of the productx so that the add-info will be put on the nth(maxproductcount) place.