Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

[resolved] Encoding issue with tfileOutputDelimited

I use a tfileOutputDelimited with encoding set on UTF-8 (default) in advanced parameters.
Nevertheless the produced file has an UTF-16LE BOM (FF FE) with UTF-16LE character encoding.
I tried to pipe a tChangeFileEncoding (UTF16->UTF8) with and without custom input encoding.
Both tests failed, i'm stuck with utf16.
Any idea ?
Frankie.
BTW : I use TOS Version: 4.1.2
Build id: r53616-20110106-0635
Labels (1)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Finally... Produced file is UTF-8, as expected, but without BOM.
My bad, the default configuration for my tool (ultraedit32) was to translate file to UTF when recognized as such, showing wrong BOM in my case (and adding it if saved).
I'll set this post as resolved.

View solution in original post

4 Replies
Anonymous
Not applicable
Author

Hi
Can you send me an example file for testing?
Best regards
Shong
Anonymous
Not applicable
Author

Seems it is related a " Will Not Fix" Java Bug.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
A quick and dirty workaround (in a tJava) :
//reading as UTF-16LE
FileInputStream fis = new FileInputStream("inpufile.txt");
BufferedReader r = new BufferedReader(new InputStreamReader(fis, "UTF-16LE"));
//writing as UTF-8
FileOutputStream fos = new FileOutputStream("ouputfile.txt");
Writer w = new BufferedWriter(new OutputStreamWriter(fos, "UTF-8"));
//copy data
for (String s = ""; (s = r.readLine()) != null;) {
w.write(s + System.getProperty("line.separator"));
w.flush();
}
//closing streams
w.close();
r.close();

The BOM is still wrong, but the encoding is right.
I did not find a convenient way to put binary files online.
So here is a small example of what I mean :
- Actual input data (readable) : ?
(LATIN CAPITAL LETTER A WITH DIAERESIS + DEGREE SIGN)
- Correct UTF-16LE (hexa) : FF FE 00 C4 00 B0
as written by talend in my case (supposed to be utf8)
- Actual output file (hexa mixed) : FF FE C3 84 C2 B0
after the above quick'n dirty conversion
- Expected output (hexa utf8) : EF BB BF C3 84 C2 B0
Edit : Oops seems that ultraedit converts automatically to utf16 when opening. Trying with a decent binary viwer/editor now.
Anonymous
Not applicable
Author

Finally... Produced file is UTF-8, as expected, but without BOM.
My bad, the default configuration for my tool (ultraedit32) was to translate file to UTF when recognized as such, showing wrong BOM in my case (and adding it if saved).
I'll set this post as resolved.
Anonymous
Not applicable
Author

Hi
Glad to see that you find the cause! Maybe you can try this component tWriteHeaderLineToFileWithBOM to output the records with BOM.

Best regards
Shong