Change encoding in ESB route (UTF-8 to Windows-1252)
Hello everybody,
This is my first message on the forum
.
I want to convert a file from the "UTF-8" format to the "windows-1252"/"Cp1252" format in Talend route.
I tested the best solution for me.
- Start component: cFtp
I indicate the "utf-8" charset in the advanced options, the charset of my file.
- Middle component: cConvertBody
I indicate the class: "byte[].class, "Cp1252""
- End component: CFtp
I indicate the "Cp1252" charset, the encoding in which I want my file.
This method doesn't work and i'm little desperate. Do you have an idea to help me ?
Thank you in advance.
PS: I included in the attach documents, the options of my components.
I have not done this before, but was interested in your problem. I don't believe you will be able to do this in the way you are trying (however, I may be wrong). What I would attempt is making use of a cProcessor component and trying to do the conversion in Java.
Take a look at this site (
http://java67.blogspot.co.uk/2015/05/how-to-convert-byte-array-to-string-in-java-example.html) for an example of how to convert a byte[] to a String of a particular encoding.
However, before you do that, you need to get the data as a byte[].
A byte is a primitive type in Java. It is not a class. Therefore your
byte[].class conversion won't work. You need to convert the type to a String.class. Then the next component should be the cProcessor. Once in the cProcessor you can get hold of your data using code similar to below....
Then the next component *should* have the converted String in the message.
As I said, I have not tried this, but I suspect that this (or a slight variant on this) logic should work for you.
I'd be interested to hear if it does.
It turns out I may have been wrong in my assertion that you can't do what you want in the way you want.....although the way I suggested should work (....the long way around 🙂 ).
Hello rhall_2.0,
Thank you very much for you answer !
I tried with your method.
cFtp -> cConvertBody (String.class) -> cProcessor (look below) -> cFtp
After this route, the file without punctuations (I am French, punctuations is used) have the "ANSI as UTF-8" format but if I add an "é","è","à".... in the file, it have the "ANSI" format.
The format "ANSI as UTF-8" is (certainly) present because of the correspondence characters between the UTF-8 and the ANSI.
I have doubts about the solution, Do you believe that this is normal? Again thank you for your help
OK, I think we are nearly there. This is slightly more complicated than I had first thought. Take a look at the accepted answer here (
http://stackoverflow.com/questions/28484064/windows-1252-to-utf-8). It seems to make sense. It is doing the reverse of what you are doing, but should be easy enough to get it to do what you want.