Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I'm facing the same issue, I have a 144Mb Json file and the job just doesn't start processing the JSON file.
Has anyone solved this , I have the same issue .
JSON file is > 200mb and talend goes out of memory while parsing it
Hi,
I have this same problem.
I am trying to read a JSON file - 190MB and Talend gives me the following error:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Unknown Source)
at java.lang.StringBuffer.toString(Unknown Source)
at org.json.simple.parser.Yylex.yylex(Unknown Source)
at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.json.simple.JSONValue.parse(Unknown Source)
at local_project.parsejson_0_1.ParseJson.tFileInputJSON_1Process(ParseJson.java:932)
at local_project.parsejson_0_1.ParseJson.runJobInTOS(ParseJson.java:1577)
at local_project.parsejson_0_1.ParseJson.main(ParseJson.java:1426)
I don't think that configuring custom JVM heap size will solve this problem, it looks like something else entirely.
My job is really simple - actually I am testing Talend capabilities to see if it can be used for my needs:
I am using Windows10 and JDK 8.
Is there a solution to this problem or we can conclude that Talend is not capable of processing larger (whatever that means - because 200MB is not a really large file - we have files sent from clients ~4GB) JSON files?
Thanks!
I have also tried adding the following flag in the JVM custom attributes -XX:-UseGCOverheadLimit. The purpose of this was that this flag will bypass the JVM cache limit and use as much memory as the Job needs to run - but again it was not successful, I got the following error:
Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space
at java.util.LinkedList.listIterator(Unknown Source)
at java.util.AbstractList.listIterator(Unknown Source)
at java.util.AbstractSequentialList.iterator(Unknown Source)
at routines.system.RunStat.sendMessages(RunStat.java:281)
at routines.system.RunStat.run(RunStat.java:245)
at java.lang.Thread.run(Unknown Source)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Unknown Source)
at java.lang.String.<init>(Unknown Source)
at org.json.simple.parser.Yylex.yytext(Unknown Source)
at org.json.simple.parser.Yylex.yylex(Unknown Source)
at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.json.simple.JSONValue.parse(Unknown Source)
at local_project.parsejson_0_1.ParseJson.tFileInputJSON_1Process(ParseJson.java:932)
at local_project.parsejson_0_1.ParseJson.runJobInTOS(ParseJson.java:1577)
at local_project.parsejson_0_1.ParseJson.main(ParseJson.java:1426)
In this scenario the job used 5GB RAM and 99% CPU for processing a file of 190MB - and failed in the end with Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space.
Hello,
Are you using 64-bit operating system and studio?
You could allocate more memory to current job via the Studio.
Please have a look at the following KB articles about java heap space:
https://community.talend.com/s/article/OutOfMemory-Exception-WmtmQ
Best regards
Sabrina
Hi xdshi,
Thank you for your answer.
Yes, I am using 64-bit operating system and studio and yes I have tried allocating more memory to the job - and I've tried some other stuff that I've found here in the Talend community.
I've also tried using tJavaFlex component (instead of tFileInputJSON or tExtractJSONFields) to parse a simpler JSON than I was using before and again I've faced the same error: Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded. But after some tweaking of the heap size attributes for the JVM and adding the -XX:-UseGCOverheadLimit flag I have successfully executed the job - which took a lot of resources and time. After all the testing I've come to the conclusion that Talend is not suitable to our needs since it is not optimized for processing JSON files.
Best regards!