Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik and ServiceNow Partner to Bring Trusted Enterprise Context into AI-Powered Workflows. Learn More!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

TRecordMatching is very slow

Hi,

 

I am matching 25000 records to 120000 records (reference file) with TRecordMatching component.

I have defined province for my Blocking section. You can see the rest of configuration in the picture.

It is running for 4 hours and still 12000 records from 25000 records are processed.

What I should do to increase performance?

 

 

0683p000009M8Ar.png

Labels (3)
2 Replies
dprot
Contributor II
Contributor II

hi,

IMO it could be related to two things:

 - have you looked at the size of each of your block? If you have only a few provinces (let's say 10 for example), then you will still have many comparisons to do (each record would be compared to approximately 12000 reference records, hence you will have around 300,000,000 comparisons)

 - how many tokens do you have in your address field? If you have more than 10 tokens, I think it's a bit risky to use "Any Order" tokenized measure, because it is a quite complex method (you can see comment of https://jira.talendforge.org/browse/TDQ-12121 for more details)

 

Anonymous
Not applicable
Author

Thank you for the reply.

I changed "any order" to No and selected "store to disk" option and the time reduced from 9 hours to 5 hours which is still very long. I thought about changing blocking from "province" but I couldn't find any other combination that would work for my case. I have first name, last name, address, province and postal code. What is your suggestion? Could changing memory heap increase the speed?

My ini file is as below

 

-vm
C:\Program Files\Java\jre1.8.0_231\bin
-vmargs
-Xms4G
-Xmx8G
-Dfile.encoding=UTF-8
-Dosgi.requiredJavaVersion=1.8
-XX:+UseG1GC
-XX:+UseStringDeduplication


config.png