Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
_AnonymousUser
Specialist III
Specialist III

Java Out of Memory Occurring for large process of csv files

Hi
I have being running a Talend job, that processes a large amount of csv files for a number of months without issue. The last successful run had just over 20000 files. The set is now up to almost 23,000 and all of a sudden it's running out of memory
Exception in component tRunJob_2
java.lang.RuntimeException: Child job return 1. It doesn't terminate normally.
Exception in thread "main" java.lang.Error: java.lang.OutOfMemoryError: Java heap space
The main job runs 4 subjobs and on the second job the above error occurs. The first job simple takes each file (inside a directory) filters by a single column value and outputs the file into another directory. The second Job takes the output files and one by one does a tSortRow & tUniqRow and then a filter based on a column falue and a Date value.
Is there any idea why an extra 2000 files would cause Talend to run out of memory? I've tried upping the heap size and it's still running out of memory
Any help at all would be greatly appreciated.
Thanks
Suzy
Labels (4)
14 Replies
Anonymous
Not applicable

Hi,
I suggest you use some monitoring tools like VisualVM or something like that to find out what object is actually consuming all the memory.
If all files are sequentially processed one would expect it not to run out of memory even with this number of files.
Hope this helps.
Regards,
Arno
_AnonymousUser
Specialist III
Specialist III
Author

Thanks for your reply Arno. I forgot to mention, I exported this job and am running it on a server through the run script. Do you think this would make a difference to why it would run out of memory?
Thanks
Suzy
Anonymous
Not applicable

Hi,
No, this shouldn't make any difference. It should even run better from the script on command line, because no graphical interface is needed there.
You can however still use VisualVM but instead of connecting it to the local Java VM, you should make a little adjustment to the job start script to allow it to accept the remote monitoring connection. (If you need more info on what these parameters are I'll look them up for you).
Regards,
Arno
Anonymous
Not applicable

Hi Arno,
If you could help me with the parameters for allowing the job to accept remote monitoring connection that would be great
Thanks!
Suzy
Anonymous
Not applicable

Hi Suzy,
To make the Talend application listen for remote monitoring you should add the following parameters to the .sh script:
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=8100
-Dcom.sun.management.jmxremote=true
-Djava.rmi.server.hostname=192.168.10.101

Of course you should change the IP address to reflect your situation and make sure that port 8100 in open in the firewall of the server running the job (eventually you change the port number to another open port that is not in use)
Regards,
Arno
_AnonymousUser
Specialist III
Specialist III
Author

Thanks a mil!
Anonymous
Not applicable

Hi,
The issue I'm finding seems to be around processing a csv file that is 190mb. I tried using the 'sort on disk' option but that causes
java.lang.OutOfMemoryError: GC overhead limit exceeded
What's the best way for processing large CSV files? (Attached image of my subjob)
djugal
Contributor III
Contributor III

increase java memory
Anonymous
Not applicable

That didnt work on our server