
Anonymous
Not applicable
2011-06-15
06:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[resolved] Encoding issue with tfileOutputDelimited
I use a tfileOutputDelimited with encoding set on UTF-8 (default) in advanced parameters.
Nevertheless the produced file has an UTF-16LE BOM (FF FE) with UTF-16LE character encoding.
I tried to pipe a tChangeFileEncoding (UTF16->UTF8) with and without custom input encoding.
Both tests failed, i'm stuck with utf16.
Any idea ?
Frankie.
BTW : I use TOS Version: 4.1.2
Build id: r53616-20110106-0635
Nevertheless the produced file has an UTF-16LE BOM (FF FE) with UTF-16LE character encoding.
I tried to pipe a tChangeFileEncoding (UTF16->UTF8) with and without custom input encoding.
Both tests failed, i'm stuck with utf16.
Any idea ?
Frankie.
BTW : I use TOS Version: 4.1.2
Build id: r53616-20110106-0635
768 Views
1 Solution
Accepted Solutions

Anonymous
Not applicable
2011-06-16
06:43 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Finally... Produced file is UTF-8, as expected, but without BOM.
My bad, the default configuration for my tool (ultraedit32) was to translate file to UTF when recognized as such, showing wrong BOM in my case (and adding it if saved).
I'll set this post as resolved.
My bad, the default configuration for my tool (ultraedit32) was to translate file to UTF when recognized as such, showing wrong BOM in my case (and adding it if saved).
I'll set this post as resolved.
768 Views
4 Replies

Anonymous
Not applicable
2011-06-15
11:24 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Can you send me an example file for testing?
Best regards
Shong
Can you send me an example file for testing?
Best regards
Shong
768 Views

Anonymous
Not applicable
2011-06-16
05:52 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Seems it is related a " Will Not Fix" Java Bug.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
A quick and dirty workaround (in a tJava) :
The BOM is still wrong, but the encoding is right.
I did not find a convenient way to put binary files online.
So here is a small example of what I mean :
- Actual input data (readable) : ?
(LATIN CAPITAL LETTER A WITH DIAERESIS + DEGREE SIGN)
- Correct UTF-16LE (hexa) : FF FE 00 C4 00 B0
as written by talend in my case (supposed to be utf8)
- Actual output file (hexa mixed) : FF FE C3 84 C2 B0
after the above quick'n dirty conversion
- Expected output (hexa utf8) : EF BB BF C3 84 C2 B0
Edit : Oops seems that ultraedit converts automatically to utf16 when opening. Trying with a decent binary viwer/editor now.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
A quick and dirty workaround (in a tJava) :
//reading as UTF-16LE
FileInputStream fis = new FileInputStream("inpufile.txt");
BufferedReader r = new BufferedReader(new InputStreamReader(fis, "UTF-16LE"));
//writing as UTF-8
FileOutputStream fos = new FileOutputStream("ouputfile.txt");
Writer w = new BufferedWriter(new OutputStreamWriter(fos, "UTF-8"));
//copy data
for (String s = ""; (s = r.readLine()) != null;) {
w.write(s + System.getProperty("line.separator"));
w.flush();
}
//closing streams
w.close();
r.close();
The BOM is still wrong, but the encoding is right.
I did not find a convenient way to put binary files online.
So here is a small example of what I mean :
- Actual input data (readable) : ?
(LATIN CAPITAL LETTER A WITH DIAERESIS + DEGREE SIGN)
- Correct UTF-16LE (hexa) : FF FE 00 C4 00 B0
as written by talend in my case (supposed to be utf8)
- Actual output file (hexa mixed) : FF FE C3 84 C2 B0
after the above quick'n dirty conversion
- Expected output (hexa utf8) : EF BB BF C3 84 C2 B0
Edit : Oops seems that ultraedit converts automatically to utf16 when opening. Trying with a decent binary viwer/editor now.
768 Views

Anonymous
Not applicable
2011-06-16
06:43 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Finally... Produced file is UTF-8, as expected, but without BOM.
My bad, the default configuration for my tool (ultraedit32) was to translate file to UTF when recognized as such, showing wrong BOM in my case (and adding it if saved).
I'll set this post as resolved.
My bad, the default configuration for my tool (ultraedit32) was to translate file to UTF when recognized as such, showing wrong BOM in my case (and adding it if saved).
I'll set this post as resolved.
769 Views

Anonymous
Not applicable
2011-06-16
07:18 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Glad to see that you find the cause! Maybe you can try this component tWriteHeaderLineToFileWithBOM to output the records with BOM.
Best regards
Shong
768 Views
