Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello Champ,
I have created a simple job:
tFileInputJSON --> (main) --> tLogRow
The source JSON file is around 2GB size. I even increased JVM in RUN to Xms2G, Xmx4G. But always fails with memory issue.
NOTE: the JSON works great if it is a simple file.
Is there a way to extract big size file? or is it a product limitation? I seen some articles for CSV file, nothing for JSON
Looking forward to hear some valuable answers. Thanks.
regards,
K
I am having same issue with json processing. My file size is 3GB. did you split the input at source?
Or you are processing the input file with 2GB and splitting it inside the job?
If you are having a complex JSON, it would be a good idea to have smaller files from source stage itself since it will avoid the overhead of splitting the big files to smaller files.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi Mohan,
Option 1:
Check your PC's RAM size (I would suggest to have 16GB as min). Based on that, increase your JVM arguments size
-Xms (for ex, -Xms2048m)
-Xmx (for ex, -Xmx4096m)
To do,
1. go to RUN in your job
2. click "Advanced settings"
3. enable "Use specific JVM arguments"
4. change values
5. then 'run' job
Option 2:
split the file into smaller files and run
Let me know how it goes.
regards,
kiruba
Hi Kiruba,
Option 1: increase Memory
I have tried increasing memory and it still fails. Server has total of 16GB memory and we have some more process running in the same server. I was able to allocate max of 4GB. Anything above that was failing to allocate memory. My inpur file is around 2.9 GB. Based on discussion with talend support, job execution itself take some memory. Apparently the json object is read as a single object of 2.9GB and load it to memory. This is the place where it fails.
Option 2: Split the input file into multiple files of smaller files.
Our POC was an input of size 2.9GB. We can do the split but for anything we need to use talend components only. Is there any way that we can do it in talend?
My thought is that we need to read the file at least once to process it and will fail there. Correct me if i am wrong here.
You could try using the component tJSONDocInputStream by Jan Lolling from Talend Exchange. It is designed specifically for reading very large files.
Thanks Fred for the suggestion. Finally we went ahead with splitting the file outside talend and process it. It worked fine for us. Proalem with the json file generated in talend was it was stored as a single json object and talend tries to load the object to memory. So we created file with each record as json object and process it to avoid the GC issue.