Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

tChangeFileEncoding and UTF8 encoding

Hello, 

 

I have an input_file encoded in ANSI that I want to encode to UTF-8.

 

So basically, I use the tChangeFileEncoding component and I do get an output_file encoded in UTF-8. While I open it with notepad++, everything is alright. 

But when i open it with Excel, "€" and "é" caracters show me things like "€_" and "é". 

 

Is there any way to fix this ? 

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

If anyone ever has the same problem, here is how I solved mine. As a reminder, I needed to change the encoding of a file.csv from ANSI to UTF-8. And I also had a problem with my UTF-8 file when I opened it with Excel.

 

First things first, it is apparently well known that excel has trouble dealing with files.csv in UTF-8. (example here). And since, the file didn't have to be used in Excel in the end, I just ignore that part. 

 

Secondly, I found that my file was not encoded in ISO-8859-15 (aka Latin-9) as I thought it was natively but in Latin-1. I tried using the option "Custom" encoding from the tFileChangeEncoding to do the job, but it was not as intuitive as I thought it would be. So I used a tJava component + a custom routine to solve this problem. For the routine, I used the java.NIO library and I found here all the encoding supported by this library. My encoding is/was "windows-1252". 

 

After that, I simply had to call my routine like : 

myPackage.MyCustomRoutine.myMethod (input_encoding, output_encoding, input_directory+input_filename, output_directory+output_filename);

 

 

 

View solution in original post

14 Replies
manodwhb
Champion II
Champion II

@JoshyBrown,what type of file are you using to change the coding?

Anonymous
Not applicable
Author

It's a .csv file. 

manodwhb
Champion II
Champion II

@JoshyBrown,based on the encoding those characters will be converted into special characters.

Anonymous
Not applicable
Author

@manodwhb, Is there a way to change/by pass that and obtain a proper .csv file when opened with excel ?
Anonymous
Not applicable
Author

I started to get a grasp on your awnser and the solution to fix my problem is to use the BOM. Unfortunately, while using tFileChangeEncoding and indicating "UTF-8-BOM", Talend can not recognize it and therefore deliver a proper output file. 

Anyone knows how to use the BOM in Talend ? Or use the custom encoding option ? 

 

*edit* 

Ok, it's not how it works. I have found this topic which is related to my problem. Apparently, I need to use a custom component in order to use BOM. BOM is not native on Talend. But maybe the previous topic is too old. I can't find the tWriteHeaderLineToFileWithBOM component. Is there a way to download it or did the OP retrieve it ? 

 

The key to my problem is the BOM. I'm sure of it. Once I can download, install and use that custom component, my problem will be solved. 

Anonymous
Not applicable
Author

Hello,

Could you please refer to this link about:https://exchange.talend.com/#marketplaceproductoverview:marketplace=marketplace%252F1&p=marketplace%...?

And feel free to let us know if you can download this custom component from talend exchange portal.

Best regards

Sabrina

 

Anonymous
Not applicable
Author

If anyone ever has the same problem, here is how I solved mine. As a reminder, I needed to change the encoding of a file.csv from ANSI to UTF-8. And I also had a problem with my UTF-8 file when I opened it with Excel.

 

First things first, it is apparently well known that excel has trouble dealing with files.csv in UTF-8. (example here). And since, the file didn't have to be used in Excel in the end, I just ignore that part. 

 

Secondly, I found that my file was not encoded in ISO-8859-15 (aka Latin-9) as I thought it was natively but in Latin-1. I tried using the option "Custom" encoding from the tFileChangeEncoding to do the job, but it was not as intuitive as I thought it would be. So I used a tJava component + a custom routine to solve this problem. For the routine, I used the java.NIO library and I found here all the encoding supported by this library. My encoding is/was "windows-1252". 

 

After that, I simply had to call my routine like : 

myPackage.MyCustomRoutine.myMethod (input_encoding, output_encoding, input_directory+input_filename, output_directory+output_filename);

 

 

 

Anonymous
Not applicable
Author

Hello Joshy,

 

Please, can you share your routine please ?

Thanks you !!

Anonymous
Not applicable
Author

Hello @sasafca , 

 

You'll find it in the join piece to this message. 

Hoping it will help.

 

 


routine_encoding.txt