<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Read huge xml in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231486#M21767</link>
    <description>Hi Vapukov, 
  &lt;BR /&gt;My main issue is to read the huge xml. Even if I want to split it, Talend will have to read it first. This step is the bottleneck. I have tried to change the .ini file to increase the java arguments with -Xms1024m and -Xmx9208m. As well, I have tried to increase the jvm settings of the job runner using specific JVM arguments (-Xms1024m and -Xmx9208m). I have tried with Talend Open Studio 5.6.2 MDM edition and 6.3.0 BigData edition&amp;nbsp; The computer I use has an SSD hard disk and a total RAM of 16Gb. After 6 hours of running the job, it is still in the status "Starting". The CPU usage is 100%. The memory usage is 14.6Gb. 
  &lt;BR /&gt;It is important to mention that I use the generation mode "fast, with low memory consumption SAX". 
  &lt;BR /&gt;This is the xml structure that I have use to create the structure in the metadata: 
  &lt;BR /&gt;&amp;lt;?xml version='1.0' encoding='UTF-8'?&amp;gt; &amp;lt;m:GenericData xmlns:footer="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message/footer"
 &lt;BR /&gt;&lt;BR /&gt;To see the whole post, download it &lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009Md8B"&gt;here&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009Md8B"&gt;OriginalPost.pdf&lt;/A&gt;</description>
    <pubDate>Sun, 11 Dec 2016 13:00:03 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2016-12-11T13:00:03Z</dc:date>
    <item>
      <title>Read huge xml</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231480#M21761</link>
      <description>Hi,
&lt;BR /&gt;I have a huge xml that I want to read. As it is an SDMX file, I wanted to imported as it is because I don't know how to specify it in the metadata otherwise. Obviously, it didn't work very well. As the file is more than 4 Gb, it crashes TOS. What would you have done in this case? Is any example to specify SDMXs in the metadata (xml files)?
&lt;BR /&gt;Thanks in advance</description>
      <pubDate>Sat, 16 Nov 2024 10:13:02 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231480#M21761</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T10:13:02Z</dc:date>
    </item>
    <item>
      <title>Re: Read huge xml</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231481#M21762</link>
      <description>what You mean - "I wanted imported as it is"? import to where? how?</description>
      <pubDate>Thu, 08 Dec 2016 21:31:52 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231481#M21762</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2016-12-08T21:31:52Z</dc:date>
    </item>
    <item>
      <title>Re: Read huge xml</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231482#M21763</link>
      <description>You could try the tFileInputXML and select SAX parsing on the advanced settings. SAX is much quicker than DOM and doesn't need to load it into memory, but you will not be able to use look ahead or look back xpath functions.</description>
      <pubDate>Fri, 09 Dec 2016 01:08:24 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231482#M21763</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-12-09T01:08:24Z</dc:date>
    </item>
    <item>
      <title>Re: Read huge xml</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231483#M21764</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;TABLE border="1"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;what You mean - "I wanted imported as it is"? import to where? how?&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;Hi Vapukov,&lt;BR /&gt;I wanted to create the xml. I used a sample xml that contained only 1 row in the loop, at the end. &lt;BR /&gt;I have tried everything, but nothing works, not even SAX, so I don't know what is the approach I could use in this case...</description>
      <pubDate>Fri, 09 Dec 2016 17:22:56 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231483#M21764</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-12-09T17:22:56Z</dc:date>
    </item>
    <item>
      <title>Re: Read huge xml</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231484#M21765</link>
      <description>&lt;BLOCKQUOTE&gt; 
 &lt;TABLE border="1"&gt; 
  &lt;TBODY&gt; 
   &lt;TR&gt; 
    &lt;TD&gt;You could try the tFileInputXML and select SAX parsing on the advanced settings. SAX is much quicker than DOM and doesn't need to load it into memory, but you will not be able to use look ahead or look back xpath functions.&lt;/TD&gt; 
   &lt;/TR&gt; 
  &lt;/TBODY&gt; 
 &lt;/TABLE&gt; 
&lt;/BLOCKQUOTE&gt; 
&lt;BR /&gt;Hi rhall, 
&lt;BR /&gt;I did try with tFileInputXML, selecting SAX in the advanced settings. The output is a tfileoutputdelimited that I split each 1000 lines. 
&lt;BR /&gt;Nothing happens, it gets stuck in "Starting". 
&lt;BR /&gt;What would you recommend? 
&lt;BR /&gt;Thanks in advance</description>
      <pubDate>Fri, 09 Dec 2016 17:24:51 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231484#M21765</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-12-09T17:24:51Z</dc:date>
    </item>
    <item>
      <title>Re: Read huge xml</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231485#M21766</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;TABLE border="1"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;what You mean - "I wanted imported as it is"? import to where? how?&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;Hi Vapukov,&lt;BR /&gt;I wanted to create the xml. I used a sample xml that contained only 1 row in the loop, at the end. &lt;BR /&gt;I have tried everything, but nothing works, not even SAX, so I don't know what is the approach I could use in this case...&lt;BR /&gt;Sorry, hard to understand - what You try to achieve?&lt;BR /&gt;in one post You tell, You are want to &lt;B&gt;write&lt;/B&gt; XML file, in next You &lt;B&gt;write csv file&lt;/B&gt; from XML&lt;BR /&gt;So, what is the global task? What steps? may be some pictures from Studio and etc&lt;BR /&gt;What structure of Your XML file? as it huge - why not try to split it for several files?</description>
      <pubDate>Sun, 11 Dec 2016 00:45:39 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231485#M21766</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2016-12-11T00:45:39Z</dc:date>
    </item>
    <item>
      <title>Re: Read huge xml</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231486#M21767</link>
      <description>Hi Vapukov, 
  &lt;BR /&gt;My main issue is to read the huge xml. Even if I want to split it, Talend will have to read it first. This step is the bottleneck. I have tried to change the .ini file to increase the java arguments with -Xms1024m and -Xmx9208m. As well, I have tried to increase the jvm settings of the job runner using specific JVM arguments (-Xms1024m and -Xmx9208m). I have tried with Talend Open Studio 5.6.2 MDM edition and 6.3.0 BigData edition&amp;nbsp; The computer I use has an SSD hard disk and a total RAM of 16Gb. After 6 hours of running the job, it is still in the status "Starting". The CPU usage is 100%. The memory usage is 14.6Gb. 
  &lt;BR /&gt;It is important to mention that I use the generation mode "fast, with low memory consumption SAX". 
  &lt;BR /&gt;This is the xml structure that I have use to create the structure in the metadata: 
  &lt;BR /&gt;&amp;lt;?xml version='1.0' encoding='UTF-8'?&amp;gt; &amp;lt;m:GenericData xmlns:footer="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message/footer"
 &lt;BR /&gt;&lt;BR /&gt;To see the whole post, download it &lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009Md8B"&gt;here&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009Md8B"&gt;OriginalPost.pdf&lt;/A&gt;</description>
      <pubDate>Sun, 11 Dec 2016 13:00:03 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231486#M21767</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-12-11T13:00:03Z</dc:date>
    </item>
    <item>
      <title>Re: Read huge xml</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231487#M21768</link>
      <description>Hi!&lt;BR /&gt;when &amp;nbsp;wrote "Split" I mean real split using one of the command line utilities, such as:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://xponentsoftware.com/xmlSplit.aspx" target="_blank" rel="nofollow noopener noreferrer"&gt;http://xponentsoftware.com/xmlSplit.aspx&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://github.com/acfr/comma/wiki/XML-Utilities" target="_blank" rel="nofollow noopener noreferrer"&gt;https://github.com/acfr/comma/wiki/XML-Utilities&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://gist.github.com/benallard/8042835" rel="nofollow noopener noreferrer"&gt;https://gist.github.com/benallard/8042835&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;then process folder with all XML one-by-one, Talend it excellent tools, but it not mean we must trust only for single tools, it never will do all what all users want.</description>
      <pubDate>Sun, 11 Dec 2016 18:41:02 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-huge-xml/m-p/2231487#M21768</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2016-12-11T18:41:02Z</dc:date>
    </item>
  </channel>
</rss>

