Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
_AnonymousUser
Specialist III
Specialist III

Java Out of Memory Occurring for large process of csv files

Hi
I have being running a Talend job, that processes a large amount of csv files for a number of months without issue. The last successful run had just over 20000 files. The set is now up to almost 23,000 and all of a sudden it's running out of memory
Exception in component tRunJob_2
java.lang.RuntimeException: Child job return 1. It doesn't terminate normally.
Exception in thread "main" java.lang.Error: java.lang.OutOfMemoryError: Java heap space
The main job runs 4 subjobs and on the second job the above error occurs. The first job simple takes each file (inside a directory) filters by a single column value and outputs the file into another directory. The second Job takes the output files and one by one does a tSortRow & tUniqRow and then a filter based on a column falue and a Date value.
Is there any idea why an extra 2000 files would cause Talend to run out of memory? I've tried upping the heap size and it's still running out of memory
Any help at all would be greatly appreciated.
Thanks
Suzy
Labels (4)
14 Replies
Anonymous
Not applicable

Hi Suzy,
Still Jugal's solution should help. The GC overhead limit exceeded error occurs when the JVM is using to much CPU for garbage collection (around 98% CPU where no more than 2% heap is freed if I'm correct).
Allowing the job to use more memory (Xmx) should increase the available free heap. To test anything like that you could try do reduce the size of your file temporary, to see if that prevents the job from crashing.
I've seen a lot of the GC overhead errors when parsing XML and the document gets loaded into memory using the complexParser in the SAX utility. With the VisualVM I analyzed this and saw that extending memory helped (to an extent) in preventing this error from occurring.
Did you monitor the job with VisualVM already? If so, could you upload a screenshot of the VisualVM when the job crashed?
Regards,
Arno
Anonymous
Not applicable

Was finally able to get it working on the Server, there were some configuration needed but upping the heap size resolved the out of memory issue.
I think ultimately though we'll have to find a better way to sort the data.
Thanks a million guys for your help!
Cheers
Suzy
djugal
Contributor III
Contributor III

0683p000009MACn.png
Anonymous
Not applicable

Hi,
Thanks for the feedback. Glad we could help. 0683p000009MACn.png
Arno
Anonymous
Not applicable

If you go into ADVANCE settings the OUTPUT component...the one receiving the data, you can define a batch size. If you define a low batch and commit size, like 1K, then it doesn't hold as much in memory and can get thru the wide data in small chunks.