Hello,
I'am trying to deduplicate 500 000 lines with tmatchgroup component, each times i ve an Exception in thread "main" java.lang.OutOfMemoryError. What's the limit for a tmatchgroup?
Thanks
Hi,
For a large set of data, could you please try to store the data on disk instead of memory on tMatchgroup?
Here is a KB article about:TalendHelpCenter:Exception: outOfMemory Best regards
Sabrina
Hello, I ve try this yesterday, now i havent errors but job is "freezing" without error message. After the first set of row processed nothing happens. You can see the screenshot that i ve uploaded. best regards
Hi,
Are you using a blocking key in the configuration of the component?
If you don't, you' retrying to do 500 000 x 500 000 comparisons. This won't fit in memory and even using the store-on-disk option, it will take days to complete...
You must use a blocking key (probably by generating it with the tGenKey component). Have a look at examples at
https://help.talend.com/search/all?query=tMatchGroup&content-lang=en The blocking key will partition the data so that the number of comparisons is greatly decreased.
See also this documentation
https://help.talend.com/search/all?query=tGenKey&content-lang=en about how to tune your tGenKey configuration for a good performance. It's advised to build blocks (aka partitions) of a few tens or hundreds of line. Use the blocking key profile to tune your partitions.
Hope this helps.