<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic parsing massive json in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/parsing-massive-json/m-p/2525202#M147818</link>
    <description>&lt;P&gt;Hello&lt;BR /&gt;&lt;SPAN&gt;In order handle a large JSON file, I am using Java streaming instead of the tFileInputJSON component to avoid Java heap memory issues. However, I am encountering problems with special characters, specifically quotes ("), which neither the Java code nor the tExtractJsonFields component can handle properly. This is because my file contains fields with free-form text, for example&lt;/SPAN&gt;:&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"lien"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"R0008"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"id"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"0008"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"CATALOG_ID"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"57"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"LAST_UPDATE"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"17/05/2025 22:20:23"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"description"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"hello, in order to "&lt;/SPAN&gt;&lt;SPAN&gt;make it ok&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;", could u m'aider?Merci 7N0LYIG1\/\/\/xF9&amp;nbsp; class=\u0"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;wich cause probleme :&lt;BR /&gt;&lt;SPAN&gt;javax.json.stream.JsonParsingException: Unexpected char 74 at (line no=1, column no=248, offset=247)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;routines.system.JSONException: Expected a ',' or '}' at 248 [character 249 line 1]&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 24 Jul 2025 12:01:09 GMT</pubDate>
    <dc:creator>Mohammed_Shevchenko</dc:creator>
    <dc:date>2025-07-24T12:01:09Z</dc:date>
    <item>
      <title>parsing massive json</title>
      <link>https://community.qlik.com/t5/Talend-Studio/parsing-massive-json/m-p/2525202#M147818</link>
      <description>&lt;P&gt;Hello&lt;BR /&gt;&lt;SPAN&gt;In order handle a large JSON file, I am using Java streaming instead of the tFileInputJSON component to avoid Java heap memory issues. However, I am encountering problems with special characters, specifically quotes ("), which neither the Java code nor the tExtractJsonFields component can handle properly. This is because my file contains fields with free-form text, for example&lt;/SPAN&gt;:&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"lien"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"R0008"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"id"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"0008"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"CATALOG_ID"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"57"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"LAST_UPDATE"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"17/05/2025 22:20:23"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"description"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"hello, in order to "&lt;/SPAN&gt;&lt;SPAN&gt;make it ok&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;", could u m'aider?Merci 7N0LYIG1\/\/\/xF9&amp;nbsp; class=\u0"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;wich cause probleme :&lt;BR /&gt;&lt;SPAN&gt;javax.json.stream.JsonParsingException: Unexpected char 74 at (line no=1, column no=248, offset=247)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;routines.system.JSONException: Expected a ',' or '}' at 248 [character 249 line 1]&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Jul 2025 12:01:09 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/parsing-massive-json/m-p/2525202#M147818</guid>
      <dc:creator>Mohammed_Shevchenko</dc:creator>
      <dc:date>2025-07-24T12:01:09Z</dc:date>
    </item>
    <item>
      <title>Re: parsing massive json</title>
      <link>https://community.qlik.com/t5/Talend-Studio/parsing-massive-json/m-p/2525960#M147838</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;To solve this problem you need first to read as "raw data" your json, like a text file.&lt;BR /&gt;&lt;BR /&gt;Then you need custom java code in order to find the "description" field and then replace the special characters of description using replaceAll method with something like below&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="java"&gt;// code generated using AI as example
// find "description" value
Pattern p = Pattern.compile("(\"description\"\\s*:\\s*\")(.*?)(\")", Pattern.DOTALL);
Matcher m = p.matcher(json);

if (m.find()) {
    String desc = m.group(2);
    // data cleaning
    String descClean = desc.replaceAll("[^\\p{L}\\p{N} \\p{Punct}]", "");
    String descReplaced = m.group(1) + descClean + m.group(3);
    json = json.replace(m.group(0), descReplaced);
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&lt;BR /&gt;Once the file is cleaned, use JSON component to read and extract values (or use custom java code to extract json directly like below)&lt;/P&gt;&lt;LI-CODE lang="java"&gt;JSONObject jsonObj = new JSONObject(json);
String description = jsonObj.getString("description");&lt;/LI-CODE&gt;</description>
      <pubDate>Fri, 01 Aug 2025 08:40:11 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/parsing-massive-json/m-p/2525960#M147838</guid>
      <dc:creator>jeoste</dc:creator>
      <dc:date>2025-08-01T08:40:11Z</dc:date>
    </item>
  </channel>
</rss>

