<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic OutOfMemoryError: GC overhead limit exceeded on large XML files in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/OutOfMemoryError-GC-overhead-limit-exceeded-on-large-XML-files/m-p/2310430#M81560</link>
    <description>hi,
&lt;BR /&gt;i am using talend 3.0.4 (mandatory i think because spagobi 3.6 have a talendengine v3.0.4), one job is extracting data using tfileinputxml on sax mode (if i use other modes i get heap out of memory errors wich i think is worse) from large xml files going up to 2gb now but might get bigger in the future)
&lt;BR /&gt;its quite a simple job ( tfileinputxml ---&amp;gt; tmap (no processing just mapping fields) ---&amp;gt; tmysqloutput ), i even tried raising xmx to 2048 but that didnt help.
&lt;BR /&gt;i also tried something ive seen here on the forum, i've put 1000 on the nbr of line buffer to tmap and 1000 commit limit for tmysqloutput... this too didnt help.
&lt;BR /&gt;first : i would like to know if i can use a more recent version with spagobi 3.6 ( im afraid to make big complex jobs to find later that i cant deploy them to the server or that there would be compability problems )
&lt;BR /&gt;second : if there is a way to solve this problem ( i did have a problem 2 days ago to copy large files using tfilecopy, found out there was a bug and was fixed for later versions, so i downloaded the fixed filecopy.jar and replaced the one i had and it worked like a charm )
&lt;BR /&gt;thank you.</description>
    <pubDate>Tue, 30 Apr 2013 16:36:38 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2013-04-30T16:36:38Z</dc:date>
    <item>
      <title>OutOfMemoryError: GC overhead limit exceeded on large XML files</title>
      <link>https://community.qlik.com/t5/Talend-Studio/OutOfMemoryError-GC-overhead-limit-exceeded-on-large-XML-files/m-p/2310430#M81560</link>
      <description>hi,
&lt;BR /&gt;i am using talend 3.0.4 (mandatory i think because spagobi 3.6 have a talendengine v3.0.4), one job is extracting data using tfileinputxml on sax mode (if i use other modes i get heap out of memory errors wich i think is worse) from large xml files going up to 2gb now but might get bigger in the future)
&lt;BR /&gt;its quite a simple job ( tfileinputxml ---&amp;gt; tmap (no processing just mapping fields) ---&amp;gt; tmysqloutput ), i even tried raising xmx to 2048 but that didnt help.
&lt;BR /&gt;i also tried something ive seen here on the forum, i've put 1000 on the nbr of line buffer to tmap and 1000 commit limit for tmysqloutput... this too didnt help.
&lt;BR /&gt;first : i would like to know if i can use a more recent version with spagobi 3.6 ( im afraid to make big complex jobs to find later that i cant deploy them to the server or that there would be compability problems )
&lt;BR /&gt;second : if there is a way to solve this problem ( i did have a problem 2 days ago to copy large files using tfilecopy, found out there was a bug and was fixed for later versions, so i downloaded the fixed filecopy.jar and replaced the one i had and it worked like a charm )
&lt;BR /&gt;thank you.</description>
      <pubDate>Tue, 30 Apr 2013 16:36:38 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/OutOfMemoryError-GC-overhead-limit-exceeded-on-large-XML-files/m-p/2310430#M81560</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-04-30T16:36:38Z</dc:date>
    </item>
    <item>
      <title>Re: OutOfMemoryError: GC overhead limit exceeded on large XML files</title>
      <link>https://community.qlik.com/t5/Talend-Studio/OutOfMemoryError-GC-overhead-limit-exceeded-on-large-XML-files/m-p/2310431#M81561</link>
      <description>Your version of Talend is older than mine, but usually I have some errors with large XML. 
&lt;BR /&gt;One approach we developed here is to split the XML file into smaller chuncks (usually never larger than 64mb) using some java code in a routine and then we process all the files in a sequence. 
&lt;BR /&gt;This allow me to use the common XML parser (way faster than SAX) and allow better XPaths in the schema definition. 
&lt;BR /&gt;The function I use is (only works if your loop tag in the xml doesn't appear anywhere inside the file): 
&lt;BR /&gt; 
&lt;PRE&gt;	public static boolean split_file(String filename, int maxpart, String tagname, String roottag, String nsdeclaration){&lt;BR /&gt;		FileOutputStream fout = null;	&lt;BR /&gt;		PrintStream outstream = null;&lt;BR /&gt;		Scanner s = null;&lt;BR /&gt;		int part=0;&lt;BR /&gt;		int partsize=0;&lt;BR /&gt;		boolean partnew=true;&lt;BR /&gt;		String partfile, suffix, token;&lt;BR /&gt;		partfile = filename.replaceFirst("\\.xml$", "");&lt;BR /&gt;		try {&lt;BR /&gt;			s = new Scanner(new FileInputStream(filename),"utf-8");&lt;BR /&gt;			s.useDelimiter("&amp;lt;/" + tagname + "&amp;gt;");&lt;BR /&gt;			while (s.hasNext()) {&lt;BR /&gt;				if(partnew){ //begin a new part file&lt;BR /&gt;					suffix = String.format("_part%04d.xml",part);&lt;BR /&gt;					fout = new FileOutputStream (partfile + suffix);&lt;BR /&gt;					outstream = new PrintStream(fout);&lt;BR /&gt;					if (part&amp;gt;0){ //insert leading tags&lt;BR /&gt;						outstream.println("&amp;lt;?xml version=\"1.0\" encoding=\"utf-8\"?&amp;gt;");&lt;BR /&gt;						outstream.println("&amp;lt;" + roottag + " " + nsdeclaration + "&amp;gt;");&lt;BR /&gt;					}&lt;BR /&gt;					partsize=0;&lt;BR /&gt;					partnew=false;&lt;BR /&gt;				}&lt;BR /&gt;				//just append tokens&lt;BR /&gt;				token = s.next();&lt;BR /&gt;				outstream.print(token);&lt;BR /&gt;				//if not last chunk append closing tag&lt;BR /&gt;				if (token.indexOf("&amp;lt;/" + roottag + "&amp;gt;")&amp;lt;0) 	outstream.println("&amp;lt;/" + tagname + "&amp;gt;");&lt;BR /&gt;				partsize += token.length();&lt;BR /&gt;				if (partsize &amp;gt; maxpart) { //time to wrap it up&lt;BR /&gt;					outstream.println("&amp;lt;/" + roottag + "&amp;gt;");&lt;BR /&gt;					outstream.close();&lt;BR /&gt;					outstream = null;&lt;BR /&gt;					fout.close();&lt;BR /&gt;					fout = null;&lt;BR /&gt;					part++;&lt;BR /&gt;					partnew = true;&lt;BR /&gt;				}&lt;BR /&gt;			}&lt;BR /&gt;			//dump the remaining part to out&lt;BR /&gt;			outstream.close();&lt;BR /&gt;			//fout.close();&lt;BR /&gt;			return true;&lt;BR /&gt;		} catch (Exception e) {&lt;BR /&gt;			System.out.println(e.getMessage());&lt;BR /&gt;			if (s != null) {&lt;BR /&gt;				s.close();&lt;BR /&gt;			}&lt;BR /&gt;			if (outstream != null) {&lt;BR /&gt;				outstream.close();&lt;BR /&gt;			}&lt;BR /&gt;			return false;&lt;BR /&gt;		}&lt;BR /&gt;	}&lt;/PRE&gt;</description>
      <pubDate>Tue, 30 Apr 2013 18:15:37 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/OutOfMemoryError-GC-overhead-limit-exceeded-on-large-XML-files/m-p/2310431#M81561</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-04-30T18:15:37Z</dc:date>
    </item>
    <item>
      <title>Re: OutOfMemoryError: GC overhead limit exceeded on large XML files</title>
      <link>https://community.qlik.com/t5/Talend-Studio/OutOfMemoryError-GC-overhead-limit-exceeded-on-large-XML-files/m-p/2310432#M81562</link>
      <description>Thank you for this neat code, this might save my project ! luckily my looping tag does not show in the data tags 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MACn.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/154443iC5B8CACEF3D12C6A/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MACn.png" alt="0683p000009MACn.png" /&gt;&lt;/span&gt; 
&lt;BR /&gt;do you suggest i add a new routine and then call it in a tjava componement or create a new componement altogether? i ask this because later i will have to deploy the jobs on the spagobi server talend engine, i dont know what exactly will be deployed ! 
&lt;BR /&gt;i'm a bit new to tweaking talend to fit my needs 
&lt;BR /&gt;EDIT : i did create a new routine and called the function from a tjava componement with the help of a tfilelist, it works like a charm. now with xml files of a max size of 60m the parsing works smoothly with no heap or gc exeptions. 
&lt;BR /&gt;as for spagobi deployement i will see that later when i setup the server.</description>
      <pubDate>Wed, 01 May 2013 22:11:41 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/OutOfMemoryError-GC-overhead-limit-exceeded-on-large-XML-files/m-p/2310432#M81562</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-05-01T22:11:41Z</dc:date>
    </item>
    <item>
      <title>Re: OutOfMemoryError: GC overhead limit exceeded on large XML files</title>
      <link>https://community.qlik.com/t5/Talend-Studio/OutOfMemoryError-GC-overhead-limit-exceeded-on-large-XML-files/m-p/2310433#M81563</link>
      <description>Sorry the delay to anwer, but I usually add a tJava in a tPreJob component.&lt;BR /&gt;Thiago</description>
      <pubDate>Wed, 09 Oct 2013 05:01:52 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/OutOfMemoryError-GC-overhead-limit-exceeded-on-large-XML-files/m-p/2310433#M81563</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-10-09T05:01:52Z</dc:date>
    </item>
  </channel>
</rss>

