Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

memory error using tfileinputJson

Hello,
   When i am extracting around 200 MB of file using tFileInputJson it's erroring out as: Out of memory exception 
where as i have configured JVM to run on 
-Xms5230M -Xmx12240M
and i am having 20GB of RAM where i have scheduling this job.
But still i am facing this problem of memory.
Can you please suggest how optimizely i can do this using talend.
If any other conponent is there who can read line by line and load it rather than reading it whole and string in memory.
Or if whiler reading i can use temp disk space rather than caching ito memory.
What should be the configuration to run a >200 MB file using the tFileInputJson Component.
Thanks,
Jyotiranjan
Labels (3)
10 Replies
Anonymous
Not applicable
Author

Hi,
Have you already checked KB article about: TalendHelpCenter:ExceptionoutOfMemory? Did you configure JVM in Advanced Settings of Run view? What does your whole job design look like?
What's your OS and JDK version?
Best regards
Sabrina
Anonymous
Not applicable
Author

Hi All,
I'm facing the same issue, I have a 144Mb Json file and the job just doesn't start processing the JSON file.
I already configure the job memory (-xmx8192M which should be far enough for that kind of processing) and currently my job consist of a TfileInputJSON followed by a tLogRow which is as simple as it gets.
I tried with a small extract (25kb) and it works fine so the job isn't the problem here.
Do you have some guidance to follow for "big" Json file processing?
I'm using Talend DI 6.0.0 and also tried Talend BD 6.1.1 with the same result.
Best,
Nicolas
Anonymous
Not applicable
Author

Using Xpath isn't the right solution, to process large json files one has to use JsonPath.
It creates other issues but using the jsonPath read the file element by element without first parsing the whole file.
Anonymous
Not applicable
Author

Hi,
I'm facing the same issue, I have a 144Mb Json file and the job just doesn't start processing the JSON file.

What's the error message you are getting?
So far, there is no documentation to follow for "big" Json file processing.
Have you tried to set up input json file metadata to see if it works?
TalendHelpCenter:Centralizing JSON file metadata
Best regards
Sabrina
karandama2006
Creator
Creator

Has anyone solved this , I have the same issue .

JSON file is  > 200mb and talend goes out of memory while parsing it

Metikosh
Contributor
Contributor

Hi,

 

I have this same problem.

 

I am trying to read a JSON file - 190MB and Talend gives me the following error:

 

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

at java.util.Arrays.copyOfRange(Unknown Source)

at java.lang.StringBuffer.toString(Unknown Source)

at org.json.simple.parser.Yylex.yylex(Unknown Source)

at org.json.simple.parser.JSONParser.nextToken(Unknown Source)

at org.json.simple.parser.JSONParser.parse(Unknown Source)

at org.json.simple.parser.JSONParser.parse(Unknown Source)

at org.json.simple.JSONValue.parse(Unknown Source)

at local_project.parsejson_0_1.ParseJson.tFileInputJSON_1Process(ParseJson.java:932)

at local_project.parsejson_0_1.ParseJson.runJobInTOS(ParseJson.java:1577)

at local_project.parsejson_0_1.ParseJson.main(ParseJson.java:1426)

 

I don't think that configuring custom JVM heap size will solve this problem, it looks like something else entirely.

 

My job is really simple - actually I am testing Talend capabilities to see if it can be used for my needs:

0695b00000Lvi16AAB.pngI am using Windows10 and JDK 8.

Is there a solution to this problem or we can conclude that Talend is not capable of processing larger (whatever that means - because 200MB is not a really large file - we have files sent from clients ~4GB) JSON files?

 

Thanks!

Metikosh
Contributor
Contributor

I have also tried adding the following flag in the JVM custom attributes -XX:-UseGCOverheadLimit. The purpose of this was that this flag will bypass the JVM cache limit and use as much memory as the Job needs to run - but again it was not successful, I got the following error:

 

Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space

at java.util.LinkedList.listIterator(Unknown Source)

at java.util.AbstractList.listIterator(Unknown Source)

at java.util.AbstractSequentialList.iterator(Unknown Source)

at routines.system.RunStat.sendMessages(RunStat.java:281)

at routines.system.RunStat.run(RunStat.java:245)

at java.lang.Thread.run(Unknown Source)

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

at java.util.Arrays.copyOfRange(Unknown Source)

at java.lang.String.<init>(Unknown Source)

at org.json.simple.parser.Yylex.yytext(Unknown Source)

at org.json.simple.parser.Yylex.yylex(Unknown Source)

at org.json.simple.parser.JSONParser.nextToken(Unknown Source)

at org.json.simple.parser.JSONParser.parse(Unknown Source)

at org.json.simple.parser.JSONParser.parse(Unknown Source)

at org.json.simple.JSONValue.parse(Unknown Source)

at local_project.parsejson_0_1.ParseJson.tFileInputJSON_1Process(ParseJson.java:932)

at local_project.parsejson_0_1.ParseJson.runJobInTOS(ParseJson.java:1577)

at local_project.parsejson_0_1.ParseJson.main(ParseJson.java:1426)

 

In this scenario the job used 5GB RAM and 99% CPU for processing a file of 190MB - and failed in the end with Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space.

Anonymous
Not applicable
Author

Hello,

Are you using 64-bit operating system and studio?

You could allocate more memory to current job via the Studio.

Please have a look at the following KB articles about java heap space:

https://community.talend.com/s/article/OutOfMemory-Exception-WmtmQ

Best regards

Sabrina

 

Metikosh
Contributor
Contributor

Hi xdshi,

 

Thank you for your answer.

 

Yes, I am using 64-bit operating system and studio and yes I have tried allocating more memory to the job - and I've tried some other stuff that I've found here in the Talend community.

 

I've also tried using tJavaFlex component (instead of tFileInputJSON or tExtractJSONFields) to parse a simpler JSON than I was using before and again I've faced the same error: Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded. But after some tweaking of the heap size attributes for the JVM and adding the -XX:-UseGCOverheadLimit flag I have successfully executed the job - which took a lot of resources and time. After all the testing I've come to the conclusion that Talend is not suitable to our needs since it is not optimized for processing JSON files.

 

Best regards!