
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
tChangeFileEncoding and UTF8 encoding
Hello,
I have an input_file encoded in ANSI that I want to encode to UTF-8.
So basically, I use the tChangeFileEncoding component and I do get an output_file encoded in UTF-8. While I open it with notepad++, everything is alright.
But when i open it with Excel, "€" and "é" caracters show me things like "€_" and "é".
Is there any way to fix this ?
- « Previous Replies
-
- 1
- 2
- Next Replies »
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If anyone ever has the same problem, here is how I solved mine. As a reminder, I needed to change the encoding of a file.csv from ANSI to UTF-8. And I also had a problem with my UTF-8 file when I opened it with Excel.
First things first, it is apparently well known that excel has trouble dealing with files.csv in UTF-8. (example here). And since, the file didn't have to be used in Excel in the end, I just ignore that part.
Secondly, I found that my file was not encoded in ISO-8859-15 (aka Latin-9) as I thought it was natively but in Latin-1. I tried using the option "Custom" encoding from the tFileChangeEncoding to do the job, but it was not as intuitive as I thought it would be. So I used a tJava component + a custom routine to solve this problem. For the routine, I used the java.NIO library and I found here all the encoding supported by this library. My encoding is/was "windows-1252".
After that, I simply had to call my routine like :
myPackage.MyCustomRoutine.myMethod (input_encoding, output_encoding, input_directory+input_filename, output_directory+output_filename);

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@JoshyBrown,what type of file are you using to change the coding?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's a .csv file.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@JoshyBrown,based on the encoding those characters will be converted into special characters.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I started to get a grasp on your awnser and the solution to fix my problem is to use the BOM.
Unfortunately, while using tFileChangeEncoding and indicating "UTF-8-BOM", Talend can not recognize it and therefore deliver a proper output file.
Anyone knows how to use the BOM in Talend ? Or use the custom encoding option ?
*edit*
Ok, it's not how it works. I have found this topic which is related to my problem. Apparently, I need to use a custom component in order to use BOM. BOM is not native on Talend. But maybe the previous topic is too old. I can't find the tWriteHeaderLineToFileWithBOM component. Is there a way to download it or did the OP retrieve it ?
The key to my problem is the BOM. I'm sure of it. Once I can download, install and use that custom component, my problem will be solved.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Could you please refer to this link about:https://exchange.talend.com/#marketplaceproductoverview:marketplace=marketplace%252F1&p=marketplace%...?
And feel free to let us know if you can download this custom component from talend exchange portal.
Best regards
Sabrina

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If anyone ever has the same problem, here is how I solved mine. As a reminder, I needed to change the encoding of a file.csv from ANSI to UTF-8. And I also had a problem with my UTF-8 file when I opened it with Excel.
First things first, it is apparently well known that excel has trouble dealing with files.csv in UTF-8. (example here). And since, the file didn't have to be used in Excel in the end, I just ignore that part.
Secondly, I found that my file was not encoded in ISO-8859-15 (aka Latin-9) as I thought it was natively but in Latin-1. I tried using the option "Custom" encoding from the tFileChangeEncoding to do the job, but it was not as intuitive as I thought it would be. So I used a tJava component + a custom routine to solve this problem. For the routine, I used the java.NIO library and I found here all the encoding supported by this library. My encoding is/was "windows-1252".
After that, I simply had to call my routine like :
myPackage.MyCustomRoutine.myMethod (input_encoding, output_encoding, input_directory+input_filename, output_directory+output_filename);

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Joshy,
Please, can you share your routine please ?
Thanks you !!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @sasafca ,
You'll find it in the join piece to this message.
Hoping it will help.
routine_encoding.txt

- « Previous Replies
-
- 1
- 2
- Next Replies »