<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic tRuleSurvivorship &amp; tMatchGroup performance issue in Data Quality</title>
    <link>https://community.qlik.com/t5/Data-Quality/tRuleSurvivorship-tMatchGroup-performance-issue/m-p/2198655#M304</link>
    <description>&lt;I&gt;Hello,&lt;/I&gt; 
&lt;BR /&gt;As part of our ETL import we wanted to identify duplicates in the file. We are using tMatchgroup? and ?tRuleSurvivorship&amp;nbsp;&amp;nbsp;to achieve this and were successful in identifying duplicates and create a new row for the survivor for each duplicate group. 
&lt;BR /&gt;While running this job on TAC, we are facing performance issue with these components. We ran a file with 2600 records and it was successful but sluggish(took 5 mins to process it). But when we run a file with 120K records, it just gets stuck on this subjob which has&amp;nbsp;tMatchgroup? and ?tRuleSurvivorship &amp;nbsp;and doesn't process the data at all. 
&lt;BR /&gt;We cannot even set up parallelization on this sub job due to these components. After adding a level of logging we have identified that&amp;nbsp;these components are the bottleneck. Can someone suggest how to improve the performance of these components. 
&lt;BR /&gt;We are using Talend Platform for Big Data 5.5.1.r118616, the jvm parameters for this job on TAC are set to (-Xms1024M,&amp;nbsp;-Xmx24576M) 
&lt;BR /&gt;Any advice on performance improvement or way around this logic will be highly appreciated. 
&lt;BR /&gt; 
&lt;I&gt;Thanks in advance.&lt;/I&gt; 
&lt;BR /&gt; 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MB4f.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/130272i53849DF09FB26C88/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MB4f.png" alt="0683p000009MB4f.png" /&gt;&lt;/span&gt;</description>
    <pubDate>Thu, 16 Jul 2015 18:44:06 GMT</pubDate>
    <dc:creator>npatel</dc:creator>
    <dc:date>2015-07-16T18:44:06Z</dc:date>
    <item>
      <title>tRuleSurvivorship &amp; tMatchGroup performance issue</title>
      <link>https://community.qlik.com/t5/Data-Quality/tRuleSurvivorship-tMatchGroup-performance-issue/m-p/2198655#M304</link>
      <description>&lt;I&gt;Hello,&lt;/I&gt; 
&lt;BR /&gt;As part of our ETL import we wanted to identify duplicates in the file. We are using tMatchgroup? and ?tRuleSurvivorship&amp;nbsp;&amp;nbsp;to achieve this and were successful in identifying duplicates and create a new row for the survivor for each duplicate group. 
&lt;BR /&gt;While running this job on TAC, we are facing performance issue with these components. We ran a file with 2600 records and it was successful but sluggish(took 5 mins to process it). But when we run a file with 120K records, it just gets stuck on this subjob which has&amp;nbsp;tMatchgroup? and ?tRuleSurvivorship &amp;nbsp;and doesn't process the data at all. 
&lt;BR /&gt;We cannot even set up parallelization on this sub job due to these components. After adding a level of logging we have identified that&amp;nbsp;these components are the bottleneck. Can someone suggest how to improve the performance of these components. 
&lt;BR /&gt;We are using Talend Platform for Big Data 5.5.1.r118616, the jvm parameters for this job on TAC are set to (-Xms1024M,&amp;nbsp;-Xmx24576M) 
&lt;BR /&gt;Any advice on performance improvement or way around this logic will be highly appreciated. 
&lt;BR /&gt; 
&lt;I&gt;Thanks in advance.&lt;/I&gt; 
&lt;BR /&gt; 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MB4f.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/130272i53849DF09FB26C88/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MB4f.png" alt="0683p000009MB4f.png" /&gt;&lt;/span&gt;</description>
      <pubDate>Thu, 16 Jul 2015 18:44:06 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/tRuleSurvivorship-tMatchGroup-performance-issue/m-p/2198655#M304</guid>
      <dc:creator>npatel</dc:creator>
      <dc:date>2015-07-16T18:44:06Z</dc:date>
    </item>
    <item>
      <title>Re: tRuleSurvivorship &amp; tMatchGroup performance issue</title>
      <link>https://community.qlik.com/t5/Data-Quality/tRuleSurvivorship-tMatchGroup-performance-issue/m-p/2198656#M305</link>
      <description>Hi 
&lt;A href="https://www.talendforge.org/forum/profile.php?id=179125" target="_blank" rel="nofollow noopener noreferrer"&gt;npatel,&lt;/A&gt;
&lt;BR /&gt;Could you please report a ticket on Talend Support Portal?
&lt;BR /&gt;In this way, we can give you a remote assistance on your performance issue through support cycle with priority?
&lt;BR /&gt;
&lt;A href="https://support.talend.com/otrs/customer.pl" target="_blank" rel="nofollow noopener noreferrer"&gt;https://support.talend.com/otrs/customer.pl&lt;/A&gt;
&lt;BR /&gt;Best regards
&lt;BR /&gt;Sabrina</description>
      <pubDate>Mon, 20 Jul 2015 04:34:53 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/tRuleSurvivorship-tMatchGroup-performance-issue/m-p/2198656#M305</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2015-07-20T04:34:53Z</dc:date>
    </item>
  </channel>
</rss>

