[resolved] tFileInputDelimited not reading international characters (UTF-8)
I am reading utf-8 encoded CSV text files, but am getting errors when reading the file with tFileInputdelimited. Once this is working, these will be saved (via a tmap -> tOracleOutput), to Oracle 11g. I am not sure if I then need to set advanced options on the tOracleOutput. The oracle db has been configured to store muti-byte characters. Probably something simple I am missing. I have attached screenshots. Dave
So, here is what I did to resolve: I brought up the source file in Firefox (file open). Then rt-click - View page info. This shows the character encoding. I saw that this file was actually encoded as UTF-16, not UTF-8, as I ad thought. I then changed the tFileinput to custom "UTF-16" in Talend, and it worked fine. Dave
thanks - I am investigating another issue with this. If that does not work, I will definitely try this. In any case, I will keep this post updated.
Thanks!
Dave
Hi Dave
From the error message, we can see that it is a Number Format exception throws on tFileInputDelimited_2, one of columns is read using Integer/int data type. Try to change it to string data type.
Best regards
Shong
Shong,
Well, the first column is a short (Integer).
I changed the first column to a string in the FileInput, and added a tConvertType, after the FileInput. In the tConvertType, I convert the first column from string to short.
I now get a new error (new "Convert" "screenshots attached)
Dave
I can. However, I think the issue however, is that Talend is not resolving UTF8 encoded data. In the screenshots, there are characters that Talend cannot resolve. I struggle with this however, as I cannot find any posts that also share this problem.
So, here is what I did to resolve: I brought up the source file in Firefox (file open). Then rt-click - View page info. This shows the character encoding. I saw that this file was actually encoded as UTF-16, not UTF-8, as I ad thought. I then changed the tFileinput to custom "UTF-16" in Talend, and it worked fine. Dave