<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Issue tExtractJSONFields Encoding - Special Characters in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Issue-tExtractJSONFields-Encoding-Special-Characters/m-p/2249684#M34150</link>
    <description>&lt;P&gt;Thanks Vapukov! That was really helpful, I can see the encoding now and am switching over to Xpath. I've tried it initially and it looks like although it fixed the majority of the introduced backslashes and even some of the formatting is better there are still some issues. Where there have been XML/HTML tags there is still a backslash being introduced.&lt;/P&gt; 
&lt;P&gt;e.g. &amp;lt;BR&amp;gt;xxx&amp;lt;/BR&amp;gt; becomes &amp;lt;BR&amp;gt;xxx&amp;lt;\/BR&amp;gt; and something new that was introduced was my integers are being replaced by strings e.g.&lt;/P&gt; 
&lt;P&gt;"test": 1000 beomes "test": "1000" and finally my empty arrays are disappearing from the extraction.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I'm going to be playing around with it more though and see if its an issue with my XPath query. But if you recognise the problems any help would be great!&lt;/P&gt;</description>
    <pubDate>Fri, 08 Feb 2019 11:01:55 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2019-02-08T11:01:55Z</dc:date>
    <item>
      <title>Issue tExtractJSONFields Encoding - Special Characters</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Issue-tExtractJSONFields-Encoding-Special-Characters/m-p/2249682#M34148</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I've been having a problem in my job where it looks like the tExtractJSONFields component is doing some sort of encoding on my json message. It is affecting some of the special characters in my message, which is causing an issue in the final file I output.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;For example:&lt;/P&gt; 
&lt;P&gt;&lt;A href="http://example.com/test" target="_blank" rel="noopener nofollow noopener noreferrer"&gt;http://example.com/test&lt;/A&gt; when extracted becomes&lt;/P&gt; 
&lt;P&gt;http:\/\/example.com\/test&lt;/P&gt; 
&lt;P&gt;or&lt;/P&gt; 
&lt;P&gt;USA/UK/Europe/Australia/New Zealand&lt;/P&gt; 
&lt;P&gt;USA\/UK\/Europe\/Australia\/New Zealand&lt;/P&gt; 
&lt;P&gt;or&lt;/P&gt; 
&lt;P&gt;Example With – Dash&lt;/P&gt; 
&lt;P&gt;Example With \u2013 Dash&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;My job flow is like follows:&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Job Flow" style="width: 899px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M2tR.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/145551i17E0549EED7E1374/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M2tR.png" alt="0683p000009M2tR.png" /&gt;&lt;/span&gt;&lt;SPAN class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Job Flow&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;I call a rest client that returns a JSON response (encoded in UTF-8) which I then extract with tExtractJSONFields (setup as follows):&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="tExtractJSONFields" style="width: 874px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M2uo.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/129965i4C6FECECE5D54F2F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M2uo.png" alt="0683p000009M2uo.png" /&gt;&lt;/span&gt;&lt;SPAN class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;tExtractJSONFields&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;Looking up the documentation for tExtractJSONFields there is supposed to be an advanced setting to set the encoding however mine is missing this option (Talend ver 6.3.1) not sure why or if this would fix the issue.&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Advanced.png" style="width: 330px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M2yf.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/150388iE09BE66FC6F6F309/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M2yf.png" alt="0683p000009M2yf.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;My understanding is that this component converts the entire body of the response to a single string, I'm not sure why it is trying to change the encoding of the response. I've got the tFileOutputDelimited set to UTF-8 and it doesn't seem to encode the string correctly either. All of the changes made by tExtractJSON fields remain in the output file.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I would really appreciate any help, I'm happy to give more info if I've missed something useful!&lt;/P&gt;</description>
      <pubDate>Thu, 07 Feb 2019 10:57:56 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Issue-tExtractJSONFields-Encoding-Special-Characters/m-p/2249682#M34148</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-02-07T10:57:56Z</dc:date>
    </item>
    <item>
      <title>Re: Issue tExtractJSONFields Encoding - Special Characters</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Issue-tExtractJSONFields-Encoding-Special-Characters/m-p/2249683#M34149</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;it not always clean from the documentation, but Encoding will be available in Advanced Settings if choose XPath instead of JSONPath &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;both work for JSON well, so you can test it&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Feb 2019 11:51:42 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Issue-tExtractJSONFields-Encoding-Special-Characters/m-p/2249683#M34149</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2019-02-07T11:51:42Z</dc:date>
    </item>
    <item>
      <title>Re: Issue tExtractJSONFields Encoding - Special Characters</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Issue-tExtractJSONFields-Encoding-Special-Characters/m-p/2249684#M34150</link>
      <description>&lt;P&gt;Thanks Vapukov! That was really helpful, I can see the encoding now and am switching over to Xpath. I've tried it initially and it looks like although it fixed the majority of the introduced backslashes and even some of the formatting is better there are still some issues. Where there have been XML/HTML tags there is still a backslash being introduced.&lt;/P&gt; 
&lt;P&gt;e.g. &amp;lt;BR&amp;gt;xxx&amp;lt;/BR&amp;gt; becomes &amp;lt;BR&amp;gt;xxx&amp;lt;\/BR&amp;gt; and something new that was introduced was my integers are being replaced by strings e.g.&lt;/P&gt; 
&lt;P&gt;"test": 1000 beomes "test": "1000" and finally my empty arrays are disappearing from the extraction.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I'm going to be playing around with it more though and see if its an issue with my XPath query. But if you recognise the problems any help would be great!&lt;/P&gt;</description>
      <pubDate>Fri, 08 Feb 2019 11:01:55 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Issue-tExtractJSONFields-Encoding-Special-Characters/m-p/2249684#M34150</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-02-08T11:01:55Z</dc:date>
    </item>
  </channel>
</rss>

