Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

[resolved] tFileInputDelimited not reading international characters (UTF-8)

I am reading utf-8 encoded CSV text files, but am getting errors when reading the file with tFileInputdelimited. Once this is working, these will be saved (via a tmap -> tOracleOutput), to Oracle 11g. I am not sure if I then need to set advanced options on the tOracleOutput. The oracle db has been configured to store muti-byte characters.
Probably something simple I am missing.
I have attached screenshots.
Dave
Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

So, here is what I did to resolve:
I brought up the source file in Firefox (file open). Then rt-click - View page info. This shows the character encoding. I saw that this file was actually encoded as UTF-16, not UTF-8, as I ad thought. I then changed the tFileinput to custom "UTF-16" in Talend, and it worked fine.
Dave

View solution in original post

8 Replies
Anonymous
Not applicable
Author

Hi
Try this. Set Encoding "Custom"->"GBK".
Regards,
Pedro
Anonymous
Not applicable
Author

thanks - I am investigating another issue with this. If that does not work, I will definitely try this. In any case, I will keep this post updated.
Thanks!
Dave
Anonymous
Not applicable
Author

"GBK" did not work. I have escalated this to Talend support.
thanks,
Dave
Anonymous
Not applicable
Author

Hi Dave
From the error message, we can see that it is a Number Format exception throws on tFileInputDelimited_2, one of columns is read using Integer/int data type. Try to change it to string data type.
Best regards
Shong
Anonymous
Not applicable
Author

Shong,
Well, the first column is a short (Integer).
I changed the first column to a string in the FileInput, and added a tConvertType, after the FileInput. In the tConvertType, I convert the first column from string to short.
I now get a new error (new "Convert" "screenshots attached)
Dave
janhess
Creator II
Creator II

Have you tried giving your ACD_No a size?
Anonymous
Not applicable
Author

I can. However, I think the issue however, is that Talend is not resolving UTF8 encoded data. In the screenshots, there are characters that Talend cannot resolve. I struggle with this however, as I cannot find any posts that also share this problem.
Anonymous
Not applicable
Author

So, here is what I did to resolve:
I brought up the source file in Firefox (file open). Then rt-click - View page info. This shows the character encoding. I saw that this file was actually encoded as UTF-16, not UTF-8, as I ad thought. I then changed the tFileinput to custom "UTF-16" in Talend, and it worked fine.
Dave