<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: [resolved] tUniqueRow java heapspace issue in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254840#M37731</link>
    <description>In the enterprise edition there are dedicated hash components.
&lt;BR /&gt;You can also try to find useful components in Exchange. Here an example:
&lt;BR /&gt;
&lt;A href="http://www.talendforge.org/exchange/index.php?eid=137&amp;amp;product=tos&amp;amp;action=view&amp;amp;nav=1,1,1" rel="nofollow noopener noreferrer"&gt;http://www.talendforge.org/exchange/index.php?eid=137&amp;amp;product=tos&amp;amp;action=view&amp;amp;nav=1,1,1&lt;/A&gt;</description>
    <pubDate>Wed, 05 Dec 2012 22:19:12 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2012-12-05T22:19:12Z</dc:date>
    <item>
      <title>[resolved] tUniqueRow java heapspace issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254832#M37723</link>
      <description>Hi All, 
&lt;BR /&gt;I am essentially trying to do a select distinct to get unique rows from a relatively small data set - 420,000 rows x 55 columns. I am using the tUniqueRows component and persistently getting java heapspace errors. 
&lt;BR /&gt;I have tried a number of options - increasing the jvm parameter up to 2048; increasing the page file; using tHashOutput and tHashInput files; doing the unique on a single column - where I would ideally like to do it across all; and writing the data set out into a delimited file in my parent job and moving the tUniqueRow into a separate job and reading the delimited file back in there. 
&lt;BR /&gt;I have tried using the tUniqueRow component with standard setting first with all of the above mentioned options, and then also setting the tUniqueRow component settings to use disk with a buffer size of 1000 for all above mentioned options - seems to make little difference to the final outcome. 
&lt;BR /&gt;When using the disk and buffer size settings, the job manages to load all rows into the tUniqueRow component, but then fails with the java heapspace error before outputting any results. I have tried output to delimited file (preferred) and also to tHashout and even tLogRow, just in case it was writing to the delimited file that caused the error. 
&lt;BR /&gt;I suspect the large number of columns is the problem, but am not sure how I can easily remedy this situation. 
&lt;BR /&gt;Any ideas??? 
&lt;BR /&gt; 
&lt;BR /&gt;Error as follows - 
&lt;BR /&gt;Starting job BMD01_UniqueRow at 11:49 05/12/2012. 
&lt;BR /&gt; 
&lt;BR /&gt; connecting to socket on port 3550 
&lt;BR /&gt; connected 
&lt;BR /&gt;Exception in thread "main" java.lang.Error: java.lang.OutOfMemoryError: Java heap space 
&lt;BR /&gt; disconnected 
&lt;BR /&gt; at moscow1.bmd01_uniquerow_0_1.BMD01_UniqueRow.tFileInputDelimited_1Process(BMD01_UniqueRow.java:5212) 
&lt;BR /&gt; at moscow1.bmd01_uniquerow_0_1.BMD01_UniqueRow.runJobInTOS(BMD01_UniqueRow.java:5393) 
&lt;BR /&gt; at moscow1.bmd01_uniquerow_0_1.BMD01_UniqueRow.main(BMD01_UniqueRow.java:5258) 
&lt;BR /&gt;Caused by: java.lang.OutOfMemoryError: Java heap space 
&lt;BR /&gt; at moscow1.bmd01_uniquerow_0_1.BMD01_UniqueRow$1FileRowIterator_tUniqRow_1.load(BMD01_UniqueRow.java:4214) 
&lt;BR /&gt; at moscow1.bmd01_uniquerow_0_1.BMD01_UniqueRow$1FileRowIterator_tUniqRow_1.next(BMD01_UniqueRow.java:4239) 
&lt;BR /&gt; at moscow1.bmd01_uniquerow_0_1.BMD01_UniqueRow.tFileInputDelimited_1Process(BMD01_UniqueRow.java:4320) 
&lt;BR /&gt; ... 2 more 
&lt;BR /&gt;Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-0" 
&lt;BR /&gt;Job BMD01_UniqueRow ended at 11:53 05/12/2012.</description>
      <pubDate>Wed, 05 Dec 2012 01:53:56 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254832#M37723</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-12-05T01:53:56Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] tUniqueRow java heapspace issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254833#M37724</link>
      <description>Hi 
&lt;BR /&gt;As you did, try to store the data on disk on tUniqRow component, don't use any hash components and tLogRow component in the job, it will consume memory during the job execution. 
&lt;BR /&gt;Shong</description>
      <pubDate>Wed, 05 Dec 2012 04:48:54 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254833#M37724</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-12-05T04:48:54Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] tUniqueRow java heapspace issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254834#M37725</link>
      <description>Hi Shong,
&lt;BR /&gt;Thanks for your reply, but I have tried this exact method. My job has only 3 components - tFileInputDelimited, tUniqueRow, tFileOutputDelimited.
&lt;BR /&gt;Any other ideas / settings???</description>
      <pubDate>Wed, 05 Dec 2012 04:56:22 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254834#M37725</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-12-05T04:56:22Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] tUniqueRow java heapspace issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254835#M37726</link>
      <description>Hi, 
&lt;BR /&gt;What about your JVM setting, i have seen 
&lt;BLOCKQUOTE&gt;
 &lt;TABLE border="1"&gt;
  &lt;TBODY&gt;
   &lt;TR&gt;
    &lt;TD&gt;increasing the jvm parameter up to 2048&lt;/TD&gt;
   &lt;/TR&gt;
  &lt;/TBODY&gt;
 &lt;/TABLE&gt;
&lt;/BLOCKQUOTE&gt;
&lt;BR /&gt;and I mean both the -XMS and -XMX
&lt;BR /&gt;Such like:
&lt;BR /&gt;-vmargs
&lt;BR /&gt;-Xms256m
&lt;BR /&gt;-Xmx1024m
&lt;BR /&gt;-XX:MaxPermSize=256m
&lt;BR /&gt;The available Heap size less than 2% will throw this exception. Are there any other running program in your computer?
&lt;BR /&gt;Best regards
&lt;BR /&gt;Sabrina</description>
      <pubDate>Wed, 05 Dec 2012 06:31:50 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254835#M37726</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-12-05T06:31:50Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] tUniqueRow java heapspace issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254836#M37727</link>
      <description>Hi Sabrina,
&lt;BR /&gt;Thanks for your reply. My config is:
&lt;BR /&gt;-vmargs
&lt;BR /&gt;-Xms64m
&lt;BR /&gt;-Xmx2048m
&lt;BR /&gt;-XX:MaxPermSize=256m
&lt;BR /&gt;-Dfile.encoding=UTF-8
&lt;BR /&gt;Should I change the -xms as well? I have tried running the job with no other programs running, but it didn't fix the issue sadly...</description>
      <pubDate>Wed, 05 Dec 2012 21:41:32 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254836#M37727</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-12-05T21:41:32Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] tUniqueRow java heapspace issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254837#M37728</link>
      <description>I think, your problem will not solved with more RAM. You need an different concept. 
&lt;BR /&gt;I suggest as first building a MD5 or SHA1 hash value over your columns which should be distinct and write the result (all columns and the additional checksum) into a new file or database table. 
&lt;BR /&gt;After that use the component tAggregateRow and use the column checksum to detect the uniqueness and use all other columns with the first-method in the calculated area.</description>
      <pubDate>Wed, 05 Dec 2012 21:56:26 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254837#M37728</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-12-05T21:56:26Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] tUniqueRow java heapspace issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254838#M37729</link>
      <description>Hi jlolling,
&lt;BR /&gt;Thanks for your reply and I think you are right about requiring a redesign, however I have no experience in building a solution like you have suggested. I will do some research and see what I can work out.</description>
      <pubDate>Wed, 05 Dec 2012 22:07:13 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254838#M37729</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-12-05T22:07:13Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] tUniqueRow java heapspace issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254839#M37730</link>
      <description>To build the hash values there are dedicated components in Talend is tAddCRCRow. This components helps you to add an CRC sum for selected columns of your flow to the output flow as additional column.&lt;BR /&gt;tFileInputDelimited (source) ---&amp;gt; tAddCRCRow ---&amp;gt; tFileOutputDelimited (temp file)&lt;BR /&gt;OnSubjobOk&lt;BR /&gt;tFileInputDelimited (temp file) --&amp;gt; tAggregateRow --&amp;gt; tFileOutputDelimited (target)</description>
      <pubDate>Wed, 05 Dec 2012 22:16:40 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254839#M37730</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-12-05T22:16:40Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] tUniqueRow java heapspace issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254840#M37731</link>
      <description>In the enterprise edition there are dedicated hash components.
&lt;BR /&gt;You can also try to find useful components in Exchange. Here an example:
&lt;BR /&gt;
&lt;A href="http://www.talendforge.org/exchange/index.php?eid=137&amp;amp;product=tos&amp;amp;action=view&amp;amp;nav=1,1,1" rel="nofollow noopener noreferrer"&gt;http://www.talendforge.org/exchange/index.php?eid=137&amp;amp;product=tos&amp;amp;action=view&amp;amp;nav=1,1,1&lt;/A&gt;</description>
      <pubDate>Wed, 05 Dec 2012 22:19:12 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254840#M37731</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-12-05T22:19:12Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] tUniqueRow java heapspace issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254841#M37732</link>
      <description>Thanks again for your help with this jlolling. 
&lt;BR /&gt;I have tried the tAddCRCRow and tAggregateRow job design you have suggested and it looks like it will work for my requirements - I assume I need to join the output from the tAggregateRow to the original file on CRC to get all of the non-aggregated columns back into my output. Sorry, not too familiar with the workings of the tAggregateRow component... Seems to work though.</description>
      <pubDate>Wed, 05 Dec 2012 23:46:16 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-tUniqueRow-java-heapspace-issue/m-p/2254841#M37732</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-12-05T23:46:16Z</dc:date>
    </item>
  </channel>
</rss>

