Gurus,
I'm new to talend. Got struck with the performance issue. Kindly help me, to fix it.
I have records in million. No chance to extract the data from db, all were from file.
>>Removing duplicates and retaining the record based on max date(used tsort and tuniq component)
>>Used different filter conditions on tmap and tfilterrow.
Job failed due to "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
>>Increased the VM Argument to -Xmx4096M
But still, I got the same error.
>>Written to temp file in tmap and sorted on disk in tsort,
Got the same error.
My questions:
-->Sort is the main culprit. Any other possible ways to sort the data(don't have staging db to sort)?
-->I'm reading the same reference file twice,why because I cannot redirect the single tinputdelimited to two tmap reference. Is there is any way to read the file only once?
-->How the overall design can be improved?
Some guidance will be greatly helpful.
Thanks
Thanks.
Yes, initially I got the Outofmemory issue. I tried two scenarios.
Scenario 1:
>>Increased the Xmx to 16GB, it worked. Performance was very good(6 min). Is it good idea to use this much memory?
Scenario 2:
>>Reduced the Xmx to 8GB and used option store on disk in tmap_1 & tmap_2. But performance was not good. With this option tmap is sorting the data and storing into disk before join.
Didn't apply store on disk to tmap_3 & tmap_4. Do you think, that will be good idea?
I cannot upload the screen shot. Getting this issue
Error : The server was unable to save the uploaded file. Please contact the forum administrator at
In tmap_3 & tmap_4, I removed the unwanted columns(9 to 3 columns) and filtered the records based on few condition.
Thanks
You could split this into multiple jobs to do the filtering and deduplicating. Then pass the cleaned data into the job above without tMap_3 & 4 and remove sort , tUnique, and tFilterRow.