Tmatchgroup Limit?

Anonymous · ‎2016-04-13

Hello,
I'am trying to deduplicate 500 000 lines with tmatchgroup component, each times i ve an Exception in thread "main" java.lang.OutOfMemoryError. What's the limit for a tmatchgroup?
Thanks

Anonymous · ‎2016-04-14

Hi,
For a large set of data, could you please try to store the data on disk instead of memory on tMatchgroup?
Here is a KB article about:TalendHelpCenter:Exception: outOfMemory
Best regards
Sabrina

Anonymous · ‎2016-04-14

Hello,
I ve try this yesterday, now i havent errors but job is "freezing" without error message. After the first set of row processed nothing happens. You can see the screenshot that i ve uploaded.
best regards

Sebastiao_Qlik · ‎2016-04-14

Hi,
Are you using a blocking key in the configuration of the component?
If you don't, you' retrying to do 500 000 x 500 000 comparisons. This won't fit in memory and even using the store-on-disk option, it will take days to complete...
You must use a blocking key (probably by generating it with the tGenKey component). Have a look at examples at https://help.talend.com/search/all?query=tMatchGroup&content-lang=en
The blocking key will partition the data so that the number of comparisons is greatly decreased.
See also this documentation https://help.talend.com/search/all?query=tGenKey&content-lang=en about how to tune your tGenKey configuration for a good performance. It's advised to build blocks (aka partitions) of a few tens or hundreds of line. Use the blocking key profile to tune your partitions.
Hope this helps.

Data Quality

v6.x