Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
_AnonymousUser
Specialist III
Specialist III

tNormalize and enclosed field

Hello,
I use a tNormalize object to split lines with space as filed separator. Some of my columns have text fields delimited with double quotes.

Jan  1 00:01:44 qid=tBVN1fmH035294 subject="Delivery Status Notification (Delay)" virusname= duration=0.037 elapsed=0.266



I would like this text field not to be split so I use in advanced tab of the tNormalize the option use CSV parameters and text enclosed with """ but it does not work...:-/
Could anyone help me to make it working properly ?
Thanks
Regards

Labels (2)
3 Replies
Anonymous
Not applicable

Hi,

Have you tried to check the "CSV Options" checkbox and type """ and """ in "escape char" and "text enclosure" fields on component tfileinputdelimited component to see if your text field displays well then using tNormalize to split lines?
[font=Verdana, Helvetica, Arial, sans-serif]Best regards[/font]
[font=Verdana, Helvetica, Arial, sans-serif]Sabrina[/font]
_AnonymousUser
Specialist III
Specialist III
Author

Hello,
Yes I tried with to check the "CSV Options" checkbox and type """ and """ in "escape char" and "text enclosure" fields on component tfileinputdelimited component and it is not working.
Clearly this components do not do what they should do, it is full of bugs
Regards
Anonymous
Not applicable

There is a difference between a "bug" and software not doing what you "think" it should do. Your format does not suit the CSV options because it does not conform to that standard. 
First of all, let's look at this logically. Why would you choose a space to normalize the data when you have a column (or more) which legitimately have spaces in them?
Your example text is below.....
Jan  1 00:01:44 qid=tBVN1fmH035294 subject="Delivery Status Notification (Delay)" virusname= duration=0.037 elapsed=0.266


The first thing that comes to mind is that (ignoring the date at the beginning) your data appears to arrive in the format "{field name}={field value}". So the first thing you need to do is remove the date section from the problem. This can be done by using Java String manipulation methods (substring with indexOf, for example) to either search for "qid" if it will always be the first non-date value or for an unbroken String with a "=" at the end. This is entirely possible in Java.
Once you are left with the rest of the String, you can use a variation on what you did to retrieve the Date, to split the rest of the data.
This is not a standard problem and as such there isn't necessarily a Talend component for precisely solving this. However, with a bit of Java knowledge and a logical approach to the problem, it is not that difficult to solve.