Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

tFileInputJSON gives "GC Overhead Limit" error

Hello Champ,

 

I have created a simple job:

tFileInputJSON --> (main) --> tLogRow

 

The source JSON file is around 2GB size. I even increased JVM in RUN to Xms2G, Xmx4G. But always fails with memory issue.

 

NOTE: the JSON works great if it is a simple file.

 

Is there a way to extract big size file? or is it a product limitation? I seen some articles for CSV file, nothing for JSON 0683p000009MPcz.png

Looking forward to hear some valuable answers. Thanks.

 

regards,

K

Labels (4)
16 Replies
mamohan
Contributor
Contributor

I am having same issue with json processing. My file size is 3GB. did you split the input at source?

Or you are processing the input file with 2GB and splitting it inside the job?

Anonymous
Not applicable
Author

@Moe 

 

If you are having a complex JSON, it would be a good idea to have smaller files from source stage itself since it will avoid the overhead of splitting the big files to smaller files.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

mamohan
Contributor
Contributor

Thanks Nikhil,
The file is not complex and we are in the process of creating a POC. I would like to know if the component will copy the whole data to memory and start the next step. I have an XML file as well which is of similar size and didnt faced any issue. Why this is a problem only for JSON.
Anonymous
Not applicable
Author

Hi Mohan,

 

Option 1:

Check your PC's RAM size (I would suggest to have 16GB as min). Based on that, increase your JVM arguments size

-Xms (for ex, -Xms2048m)

-Xmx (for ex, -Xmx4096m)

To do,

1. go to RUN in your job

2. click "Advanced settings"

3. enable "Use specific JVM arguments"

4. change values

5. then 'run' job

 

Option 2:

split the file into smaller files and run

 

Let me know how it goes.

 

regards,

kiruba

mamohan
Contributor
Contributor

Hi Kiruba, 

Option 1: increase Memory

I have tried increasing memory and it still fails. Server has total of 16GB memory and we have some more process running in the same server. I was able to allocate max of 4GB. Anything above that was failing to allocate memory. My inpur file is around 2.9 GB. Based on discussion with talend support, job execution itself take some memory. Apparently the json object is read as a single object of 2.9GB and load it to memory. This is the place where it fails. 

Option 2: Split the input file into multiple files of smaller files. 

Our POC was an input of size 2.9GB. We can do the split but for anything we need to use talend components only. Is there any way that we can do it in talend?

My thought is that we need to read the file at least once to process it and will fail there. Correct me if i am wrong here. 

Anonymous
Not applicable
Author

You could try using the component tJSONDocInputStream by Jan Lolling from Talend Exchange. It is designed specifically for reading very large files.

mamohan
Contributor
Contributor

Thanks Fred for the suggestion. Finally we went ahead with splitting the file outside talend and process it. It worked fine for us. Proalem with the json file generated in talend was it was stored as a single json object and talend tries to load the object to memory. So we created file with each record as json object and process it to avoid the GC issue.