Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

[resolved] tFileInputDelimited not reading international characters (UTF-8)

I am reading utf-8 encoded CSV text files, but am getting errors when reading the file with tFileInputdelimited. Once this is working, these will be saved (via a tmap -> tOracleOutput), to Oracle 11g. I am not sure if I then need to set advanced options on the tOracleOutput. The oracle db has been configured to store muti-byte characters.
Probably something simple I am missing.
I have attached screenshots.
Dave
Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

So, here is what I did to resolve:
I brought up the source file in Firefox (file open). Then rt-click - View page info. This shows the character encoding. I saw that this file was actually encoded as UTF-16, not UTF-8, as I ad thought. I then changed the tFileinput to custom "UTF-16" in Talend, and it worked fine.
Dave

View solution in original post

8 Replies
Anonymous
Not applicable
Author

Hi
Try this. Set Encoding "Custom"->"GBK".
Regards,
Pedro
Anonymous
Not applicable
Author

thanks - I am investigating another issue with this. If that does not work, I will definitely try this. In any case, I will keep this post updated.
Thanks!
Dave
Anonymous
Not applicable
Author

"GBK" did not work. I have escalated this to Talend support.
thanks,
Dave
Anonymous
Not applicable
Author

Hi Dave
From the error message, we can see that it is a Number Format exception throws on tFileInputDelimited_2, one of columns is read using Integer/int data type. Try to change it to string data type.
Best regards
Shong
Anonymous
Not applicable
Author

Shong,
Well, the first column is a short (Integer).
I changed the first column to a string in the FileInput, and added a tConvertType, after the FileInput. In the tConvertType, I convert the first column from string to short.
I now get a new error (new "Convert" "screenshots attached)
Dave
janhess
Creator II
Creator II

Have you tried giving your ACD_No a size?
Anonymous
Not applicable
Author

I can. However, I think the issue however, is that Talend is not resolving UTF8 encoded data. In the screenshots, there are characters that Talend cannot resolve. I struggle with this however, as I cannot find any posts that also share this problem.
Anonymous
Not applicable
Author

So, here is what I did to resolve:
I brought up the source file in Firefox (file open). Then rt-click - View page info. This shows the character encoding. I saw that this file was actually encoded as UTF-16, not UTF-8, as I ad thought. I then changed the tFileinput to custom "UTF-16" in Talend, and it worked fine.
Dave