Issue tExtractJSONFields Encoding - Special Charac... - Qlik Community

Anonymous · ‎2019-02-07

Hi,

I've been having a problem in my job where it looks like the tExtractJSONFields component is doing some sort of encoding on my json message. It is affecting some of the special characters in my message, which is causing an issue in the final file I output.

For example:

http://example.com/test when extracted becomes

http:\/\/example.com\/test

or

USA/UK/Europe/Australia/New Zealand

USA\/UK\/Europe\/Australia\/New Zealand

or

Example With – Dash

Example With \u2013 Dash

My job flow is like follows:

Job Flow

I call a rest client that returns a JSON response (encoded in UTF-8) which I then extract with tExtractJSONFields (setup as follows):

tExtractJSONFields

Looking up the documentation for tExtractJSONFields there is supposed to be an advanced setting to set the encoding however mine is missing this option (Talend ver 6.3.1) not sure why or if this would fix the issue.

My understanding is that this component converts the entire body of the response to a single string, I'm not sure why it is trying to change the encoding of the response. I've got the tFileOutputDelimited set to UTF-8 and it doesn't seem to encode the string correctly either. All of the changes made by tExtractJSON fields remain in the output file.

I would really appreciate any help, I'm happy to give more info if I've missed something useful!

vapukov · ‎2019-02-07

Hi!

it not always clean from the documentation, but Encoding will be available in Advanced Settings if choose XPath instead of JSONPath 🙂

both work for JSON well, so you can test it

Anonymous · ‎2019-02-08

Thanks Vapukov! That was really helpful, I can see the encoding now and am switching over to Xpath. I've tried it initially and it looks like although it fixed the majority of the introduced backslashes and even some of the formatting is better there are still some issues. Where there have been XML/HTML tags there is still a backslash being introduced.

e.g. <BR>xxx</BR> becomes <BR>xxx<\/BR> and something new that was introduced was my integers are being replaced by strings e.g.

"test": 1000 beomes "test": "1000" and finally my empty arrays are disappearing from the extraction.

I'm going to be playing around with it more though and see if its an issue with my XPath query. But if you recognise the problems any help would be great!

Issue tExtractJSONFields Encoding - Special Characters

JSON

REST

Talend Data Integration

v7.x