Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Discover how organizations are unlocking new revenue streams: Watch here
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Performance issue with below design

Gurus,
I'm new to talend. Got struck with the performance issue. Kindly help me, to fix it.
I have records in million. No chance to extract the data from db, all were from file.
0683p000009MDId.png
>>Removing duplicates and retaining the record based on max date(used tsort and tuniq component)
>>Used different filter conditions on tmap and tfilterrow.

Job failed due to "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
>>Increased the VM Argument to -Xmx4096M
But still, I got the same error.
>>Written to temp file in tmap and sorted on disk in tsort,
Got the same error.

My questions:
-->Sort is the main culprit. Any other possible ways to sort the data(don't have staging db to sort)?
-->I'm reading the same reference file twice,why because I cannot redirect the single tinputdelimited to two tmap reference. Is there is any way to read the file only once?
-->How the overall design can be improved?
Some guidance will be greatly helpful.
Thanks
Labels (3)
11 Replies
Anonymous
Not applicable
Author

Thanks.
Yes, initially I got the Outofmemory issue. I tried two scenarios.
Scenario 1:
>>Increased the Xmx to 16GB, it worked. Performance was very good(6 min). Is it good idea to use this much memory?
Scenario 2:
>>Reduced the Xmx to 8GB and used option store on disk in tmap_1 & tmap_2. But performance was not good. With this option tmap is sorting the data and storing into disk before join.
Didn't apply store on disk to tmap_3 & tmap_4. Do you think, that will be good idea?
I cannot upload the screen shot. Getting this issue 
Error : The server was unable to save the uploaded file. Please contact the forum administrator at

In tmap_3 & tmap_4, I removed the unwanted columns(9 to 3 columns) and filtered the records based on few condition.
Thanks
Anonymous
Not applicable
Author

You could split this into multiple jobs to do the filtering and deduplicating. Then pass the cleaned data into the job above without tMap_3 & 4 and remove sort , tUnique, and tFilterRow.