<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Running out of memory with tAggregateRow component in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Running-out-of-memory-with-tAggregateRow-component/m-p/2376110#M138364</link>
    <description>Hi, 
&lt;BR /&gt;I'm trying to aggregate 5M+ records with tAggregareRow and running out of Java heap space. Is there any way around this issue except for increasing VM memory settings (Xmx)? I tried to increase them to 1.5GB (Xmx1536M) but still running out of heap space at 400K rows. 
&lt;BR /&gt;Below is exception message if it is of any help: 
&lt;BR /&gt;Starting job Load_Hostel_Products_Prices_Allocations at 16:19 27/09/2013. 
&lt;BR /&gt; 
&lt;BR /&gt; connecting to socket on port 4050 
&lt;BR /&gt; connected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt;Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 
&lt;BR /&gt; at java.util.Arrays.copyOfRange(Unknown Source) 
&lt;BR /&gt; at java.lang.String.&amp;lt;init&amp;gt;(Unknown Source) 
&lt;BR /&gt; at java.lang.StringBuilder.toString(Unknown Source) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tFileInputDelimited_1Process(Load_Hostel_Products_Prices_Allocations.java:43509) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tFileInputDelimited_12Process(Load_Hostel_Products_Prices_Allocations.java:17241) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tFileInputDelimited_11Process(Load_Hostel_Products_Prices_Allocations.java:15048) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tFileInputDelimited_4Process(Load_Hostel_Products_Prices_Allocations.java:12681) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tFileDelete_2Process(Load_Hostel_Products_Prices_Allocations.java:9797) 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tMysqlInput_2Process(Load_Hostel_Products_Prices_Allocations.java:9655) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tSalesforceBulkExec_7Process(Load_Hostel_Products_Prices_Allocations.java:8707) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tMysqlInput_1Process(Load_Hostel_Products_Prices_Allocations.java:7903) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tFileDelete_1Process(Load_Hostel_Products_Prices_Allocations.java:4344) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tSalesforceBulkExec_5Process(Load_Hostel_Products_Prices_Allocations.java:4202) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tSalesforceInput_7Process(Load_Hostel_Products_Prices_Allocations.java:3203) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tSalesforceBulkExec_4Process(Load_Hostel_Products_Prices_Allocations.java:2775) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tSalesforceInput_6Process(Load_Hostel_Products_Prices_Allocations.java:1782) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.runJobInTOS(Load_Hostel_Products_Prices_Allocations.java:47743) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.main(Load_Hostel_Products_Prices_Allocations.java:47525) 
&lt;BR /&gt;Job Load_Hostel_Products_Prices_Allocations ended at 16:41 27/09/2013.</description>
    <pubDate>Sat, 16 Nov 2024 11:53:43 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2024-11-16T11:53:43Z</dc:date>
    <item>
      <title>Running out of memory with tAggregateRow component</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Running-out-of-memory-with-tAggregateRow-component/m-p/2376110#M138364</link>
      <description>Hi, 
&lt;BR /&gt;I'm trying to aggregate 5M+ records with tAggregareRow and running out of Java heap space. Is there any way around this issue except for increasing VM memory settings (Xmx)? I tried to increase them to 1.5GB (Xmx1536M) but still running out of heap space at 400K rows. 
&lt;BR /&gt;Below is exception message if it is of any help: 
&lt;BR /&gt;Starting job Load_Hostel_Products_Prices_Allocations at 16:19 27/09/2013. 
&lt;BR /&gt; 
&lt;BR /&gt; connecting to socket on port 4050 
&lt;BR /&gt; connected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt;Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 
&lt;BR /&gt; at java.util.Arrays.copyOfRange(Unknown Source) 
&lt;BR /&gt; at java.lang.String.&amp;lt;init&amp;gt;(Unknown Source) 
&lt;BR /&gt; at java.lang.StringBuilder.toString(Unknown Source) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tFileInputDelimited_1Process(Load_Hostel_Products_Prices_Allocations.java:43509) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tFileInputDelimited_12Process(Load_Hostel_Products_Prices_Allocations.java:17241) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tFileInputDelimited_11Process(Load_Hostel_Products_Prices_Allocations.java:15048) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tFileInputDelimited_4Process(Load_Hostel_Products_Prices_Allocations.java:12681) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tFileDelete_2Process(Load_Hostel_Products_Prices_Allocations.java:9797) 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tMysqlInput_2Process(Load_Hostel_Products_Prices_Allocations.java:9655) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tSalesforceBulkExec_7Process(Load_Hostel_Products_Prices_Allocations.java:8707) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tMysqlInput_1Process(Load_Hostel_Products_Prices_Allocations.java:7903) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tFileDelete_1Process(Load_Hostel_Products_Prices_Allocations.java:4344) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tSalesforceBulkExec_5Process(Load_Hostel_Products_Prices_Allocations.java:4202) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tSalesforceInput_7Process(Load_Hostel_Products_Prices_Allocations.java:3203) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tSalesforceBulkExec_4Process(Load_Hostel_Products_Prices_Allocations.java:2775) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.tSalesforceInput_6Process(Load_Hostel_Products_Prices_Allocations.java:1782) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.runJobInTOS(Load_Hostel_Products_Prices_Allocations.java:47743) 
&lt;BR /&gt; at hi360_20130926.load_hostel_products_prices_allocations_0_1.Load_Hostel_Products_Prices_Allocations.main(Load_Hostel_Products_Prices_Allocations.java:47525) 
&lt;BR /&gt;Job Load_Hostel_Products_Prices_Allocations ended at 16:41 27/09/2013.</description>
      <pubDate>Sat, 16 Nov 2024 11:53:43 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Running-out-of-memory-with-tAggregateRow-component/m-p/2376110#M138364</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T11:53:43Z</dc:date>
    </item>
    <item>
      <title>Re: Running out of memory with tAggregateRow component</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Running-out-of-memory-with-tAggregateRow-component/m-p/2376111#M138365</link>
      <description>tAggregateRow collects memory for every unique dataset in the input stream. It needs to collect nearly everything because the component can calculate the uniqueness only at the end of the flow. 
&lt;BR /&gt;If you are able to read the data in a sorted order you can use tAggregateSortedRow. This component releases all data which are ready inspected because of the sort order. It save a lot of memory but need sorted data.</description>
      <pubDate>Sat, 28 Sep 2013 21:59:21 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Running-out-of-memory-with-tAggregateRow-component/m-p/2376111#M138365</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-09-28T21:59:21Z</dc:date>
    </item>
    <item>
      <title>Re: Running out of memory with tAggregateRow component</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Running-out-of-memory-with-tAggregateRow-component/m-p/2376112#M138366</link>
      <description>Thank for your reply. Couple thoughts on it:
&lt;BR /&gt;
&lt;BLOCKQUOTE&gt;
 &lt;TABLE border="1"&gt;
  &lt;TBODY&gt;
   &lt;TR&gt;
    &lt;TD&gt;It needs to collect nearly everything because the component can calculate the uniqueness only at the end of the flow.&lt;/TD&gt;
   &lt;/TR&gt;
  &lt;/TBODY&gt;
 &lt;/TABLE&gt;
&lt;/BLOCKQUOTE&gt;
&lt;BR /&gt;Not sure what "uniqueness" you are talking about. I get that final aggregation result can only be produced after all rows where processed, but I think a lot of processing ("first", "last", "sum" functions) can be done on the go and processed rows can be discarded.
&lt;BR /&gt;
&lt;BLOCKQUOTE&gt;
 &lt;TABLE border="1"&gt;
  &lt;TBODY&gt;
   &lt;TR&gt;
    &lt;TD&gt;If you are able to read the data in a sorted order you can use tAggregateSortedRow.&lt;/TD&gt;
   &lt;/TR&gt;
  &lt;/TBODY&gt;
 &lt;/TABLE&gt;
&lt;/BLOCKQUOTE&gt;
&lt;BR /&gt;I had a look at tAggregateSortedRow but I noticed it had a setting called "Input rows count". Does that mean I need to know the total number of rows it will be aggregating before I run the job? Or can this setting be skipped?</description>
      <pubDate>Sat, 28 Sep 2013 23:03:35 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Running-out-of-memory-with-tAggregateRow-component/m-p/2376112#M138366</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-09-28T23:03:35Z</dc:date>
    </item>
  </channel>
</rss>

