Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Save $650 on Qlik Connect, Dec 1 - 7, our lowest price of the year. Register with code CYBERWEEK: Register
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Umlauts in UTF-8

Hello
I'm transforming an XML file. The source file has umlaut characters which have been converted to their utf-8 equivalents. For example the source file contains:
<TOWNCITY>Düsseldorf</TOWNCITY>
When I transform this into an new XML format, TOS converts this to:
<TOWNCITY>Düsseldorf</TOWNCITY>
I would like to preserve the original, but I can't figure it out. Both the source and the output file are configured to be UTF-8 encoding.
Any ideas how I can achieve this?
Thanks for your help.
Labels (3)
5 Replies
Anonymous
Not applicable
Author

Hmmm.... the character has been converted in the post too. So the original should be:
<TOWNCITY>D & # 2 5 2 ; sseldorf</TOWNCITY>
With spaces, so that it does not convert
Anonymous
Not applicable
Author

Hi Jonathan,
I think the output of the job should be in the format you specify in the output component. So if your input contains UTF-8 and you read this into Talend it will convert it to an internal format, but when exporting, you should be able to select the desired format again, UTF-8 for example. This should give you a file or table with the correct data.
Hope this helps.
Regards,
Arno
Anonymous
Not applicable
Author

Arno
Thanks for the reply. I'm doing as you suggest - the source file is read as UTF-8 and the output I create is also UTF-8, but Talend is still converting the data to the umlaut character. Maybe its a bug - I can't find any configuration parameters that will change this.
Jonathan
janhess
Creator II
Creator II

If it's a bug you could get round it by replacing in a tMap or tReplace but it will probably affect a number of characters.
Anonymous
Not applicable
Author

Yes - I tried to post process it with a tReplace - no luck with this either I'm afraid - it still converts back to the umlaut character.