Change encoding in ESB route (UTF-8 to Windows-125... - Qlik Community

Anonymous · ‎2016-05-01

Hello everybody,
This is my first message on the forum

.
I want to convert a file from the "UTF-8" format to the "windows-1252"/"Cp1252" format in Talend route.
I tested the best solution for me.
- Start component: cFtp
I indicate the "utf-8" charset in the advanced options, the charset of my file.
- Middle component: cConvertBody
I indicate the class: "byte[].class, "Cp1252""
- End component: CFtp
I indicate the "Cp1252" charset, the encoding in which I want my file.
This method doesn't work and i'm little desperate. Do you have an idea to help me ?
Thank you in advance.
PS: I included in the attach documents, the options of my components.

Anonymous · ‎2016-05-01

I have not done this before, but was interested in your problem. I don't believe you will be able to do this in the way you are trying (however, I may be wrong). What I would attempt is making use of a cProcessor component and trying to do the conversion in Java.
Take a look at this site ( http://java67.blogspot.co.uk/2015/05/how-to-convert-byte-array-to-string-in-java-example.html) for an example of how to convert a byte[] to a String of a particular encoding.
However, before you do that, you need to get the data as a byte[].
A byte is a primitive type in Java. It is not a class. Therefore your byte[].class conversion won't work. You need to convert the type to a String.class. Then the next component should be the cProcessor. Once in the cProcessor you can get hold of your data using code similar to below....

String myString = exchange.getIn().getBody(String.class);

You can then refer to the post below, to convert the String to a byte[] in the cProcessor.
http://stackoverflow.com/questions/18571223/how-to-convert-java-string-into-byte
Then use the post I gave in the first paragraph ( http://java67.blogspot.co.uk/2015/05/how-to-convert-byte-array-to-string-in-java-example.html) to convert the encoding.
Then use code very similar to below to put your newly converted String back into the body....

exchange.getIn().setBody(myConvertedString);

Then the next component *should* have the converted String in the message.
As I said, I have not tried this, but I suspect that this (or a slight variant on this) logic should work for you.
I'd be interested to hear if it does.

Anonymous · ‎2016-05-02

It turns out I may have been wrong in my assertion that you can't do what you want in the way you want.....although the way I suggested should work (....the long way around 🙂 ).

Anonymous · ‎2016-05-03

Hello rhall_2.0,
Thank you very much for you answer !
I tried with your method.
cFtp -> cConvertBody (String.class) -> cProcessor (look below) -> cFtp

After this route, the file without punctuations (I am French, punctuations is used) have the "ANSI as UTF-8" format but if I add an "é","è","à".... in the file, it have the "ANSI" format.
The format "ANSI as UTF-8" is (certainly) present because of the correspondence characters between the UTF-8 and the ANSI.
I have doubts about the solution, Do you believe that this is normal? Again thank you for your help

Anonymous · ‎2016-05-03

OK, I think we are nearly there. This is slightly more complicated than I had first thought. Take a look at the accepted answer here ( http://stackoverflow.com/questions/28484064/windows-1252-to-utf-8). It seems to make sense. It is doing the reverse of what you are doing, but should be easy enough to get it to do what you want.

Change encoding in ESB route (UTF-8 to Windows-1252)

Other

v6.x