Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Luis_Simoes
Contributor
Contributor

Performance of running deduplication using tMatchGroup and Swoosh

Hi,

Is anyone using tMatchGroup to deduplicate customer master data on Talend?

I am currently using Swoosh on top of a 2M rows dataset and I am constantly getting out of memory errors...

My server is an Azure VM with 16GB of RAM and 4vCores that are used pretty much at 100% through the whole processing time.

 

I am wondering why the performance is so poor, and how I will be able to scale it since I have 13 sources yet to be added to the same dataset...

 

Any suggestions and considerations?

How to size and estimate the processing requirements for this task?

What can be tuned in terms of settings of the Run or Component?

 

Thank you


Regards,

LS

Labels (6)
0 Replies