<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic TRecordMatching is very slow in Data Quality</title>
    <link>https://community.qlik.com/t5/Data-Quality/TRecordMatching-is-very-slow/m-p/2266174#M2139</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am matching 25000 records to 120000 records (reference file) with TRecordMatching component.&lt;/P&gt;
&lt;P&gt;I have defined province for my Blocking section. You can see the rest of configuration in the picture.&lt;/P&gt;
&lt;P&gt;It is running for 4 hours and still 12000 records from 25000 records are processed.&lt;/P&gt;
&lt;P&gt;What I should do to increase performance?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="image.png" style="width: 999px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M8Ar.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/134611i0DE4224DAF81D7C4/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M8Ar.png" alt="0683p000009M8Ar.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Sat, 16 Nov 2024 04:01:31 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2024-11-16T04:01:31Z</dc:date>
    <item>
      <title>TRecordMatching is very slow</title>
      <link>https://community.qlik.com/t5/Data-Quality/TRecordMatching-is-very-slow/m-p/2266174#M2139</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am matching 25000 records to 120000 records (reference file) with TRecordMatching component.&lt;/P&gt;
&lt;P&gt;I have defined province for my Blocking section. You can see the rest of configuration in the picture.&lt;/P&gt;
&lt;P&gt;It is running for 4 hours and still 12000 records from 25000 records are processed.&lt;/P&gt;
&lt;P&gt;What I should do to increase performance?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="image.png" style="width: 999px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M8Ar.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/134611i0DE4224DAF81D7C4/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M8Ar.png" alt="0683p000009M8Ar.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 04:01:31 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/TRecordMatching-is-very-slow/m-p/2266174#M2139</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T04:01:31Z</dc:date>
    </item>
    <item>
      <title>Re: TRecordMatching is very slow</title>
      <link>https://community.qlik.com/t5/Data-Quality/TRecordMatching-is-very-slow/m-p/2266175#M2140</link>
      <description>&lt;P&gt;hi,&lt;/P&gt; 
&lt;P&gt;IMO it could be related to two things:&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;- have you looked at the size of each of your block? If you have only a few provinces (let's say 10 for example), then you will still have many comparisons to do (each record would be compared to approximately 12000 reference records, hence you will have around 300,000,000 comparisons)&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;- how many tokens do you have in your address field? If you have more than 10 tokens, I think it's a bit risky to use "Any Order" tokenized measure, because it is a quite complex method (you can see comment of &lt;A href="https://jira.talendforge.org/browse/TDQ-12121" target="_blank" rel="nofollow noopener noreferrer"&gt;https://jira.talendforge.org/browse/TDQ-12121&lt;/A&gt; for more details)&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Nov 2019 07:59:49 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/TRecordMatching-is-very-slow/m-p/2266175#M2140</guid>
      <dc:creator>dprot</dc:creator>
      <dc:date>2019-11-26T07:59:49Z</dc:date>
    </item>
    <item>
      <title>Re: TRecordMatching is very slow</title>
      <link>https://community.qlik.com/t5/Data-Quality/TRecordMatching-is-very-slow/m-p/2266176#M2141</link>
      <description>&lt;P&gt;Thank you for the reply.&lt;/P&gt; 
&lt;P&gt;I changed "any order" to No and selected "store to disk" option and the time reduced from 9 hours to 5 hours which is still very long. I thought about changing blocking from "province" but I couldn't find any other combination that would work for my case. I have first name, last name, address, province and postal code. What is your suggestion? Could changing memory heap increase the speed?&lt;/P&gt; 
&lt;P&gt;My ini file is as below&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;-vm&lt;BR /&gt;C:\Program Files\Java\jre1.8.0_231\bin&lt;BR /&gt;-vmargs&lt;BR /&gt;-Xms4G&lt;BR /&gt;-Xmx8G&lt;BR /&gt;-Dfile.encoding=UTF-8&lt;BR /&gt;-Dosgi.requiredJavaVersion=1.8&lt;BR /&gt;-XX:+UseG1GC&lt;BR /&gt;-XX:+UseStringDeduplication&lt;/P&gt;&lt;BR /&gt;&lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009LxEl"&gt;config.png&lt;/A&gt;</description>
      <pubDate>Wed, 27 Nov 2019 03:13:28 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/TRecordMatching-is-very-slow/m-p/2266176#M2141</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-11-27T03:13:28Z</dc:date>
    </item>
  </channel>
</rss>

