Java Out of Memory Occurring for large process of csv files
Hi
I have being running a Talend job, that processes a large amount of csv files for a number of months without issue. The last successful run had just over 20000 files. The set is now up to almost 23,000 and all of a sudden it's running out of memory
Exception in component tRunJob_2
java.lang.RuntimeException: Child job return 1. It doesn't terminate normally.
Exception in thread "main" java.lang.Error: java.lang.OutOfMemoryError: Java heap space
The main job runs 4 subjobs and on the second job the above error occurs. The first job simple takes each file (inside a directory) filters by a single column value and outputs the file into another directory. The second Job takes the output files and one by one does a tSortRow & tUniqRow and then a filter based on a column falue and a Date value.
Is there any idea why an extra 2000 files would cause Talend to run out of memory? I've tried upping the heap size and it's still running out of memory
Any help at all would be greatly appreciated.
Thanks
Suzy
Hi Suzy,
Still Jugal's solution should help. The GC overhead limit exceeded error occurs when the JVM is using to much CPU for garbage collection (around 98% CPU where no more than 2% heap is freed if I'm correct).
Allowing the job to use more memory (Xmx) should increase the available free heap. To test anything like that you could try do reduce the size of your file temporary, to see if that prevents the job from crashing.
I've seen a lot of the GC overhead errors when parsing XML and the document gets loaded into memory using the complexParser in the SAX utility. With the VisualVM I analyzed this and saw that extending memory helped (to an extent) in preventing this error from occurring.
Did you monitor the job with VisualVM already? If so, could you upload a screenshot of the VisualVM when the job crashed?
Regards,
Arno
Was finally able to get it working on the Server, there were some configuration needed but upping the heap size resolved the out of memory issue.
I think ultimately though we'll have to find a better way to sort the data.
Thanks a million guys for your help!
Cheers
Suzy
If you go into ADVANCE settings the OUTPUT component...the one receiving the data, you can define a batch size. If you define a low batch and commit size, like 1K, then it doesn't hold as much in memory and can get thru the wide data in small chunks.