[resolved] java.lang.OutOfMemoryError: Java heap space during flat file upload
Hi,
I am getting out of memory issue, when i try to upload the flat file, which contains 11 millions records. Since i don't have enough space to increase my heapsize memory, hence i would like to get suggestion on how to split the file and upload instead of using memory space to store in disk space. Since I don't have mapping with other table in my tMap to set true for Store temp data.
Kindly help me asap.
Thanks
Vijay
Hello
It is impossible to store the data in disk on a file component, all the data are read and stored in memory. If you are limited to increase the memory to the job execution, you can try to split the source file to several small files with a job. For example:
tLoop--iterate--tFileInputFullRow--main--tFileOutputDelimited
tLoop: do a for loop.
From:0
To: the total number of lines you have in the source file or set a big number greater than the real number of lines.
Step: the number of lines you want to have in a small file, let's say 1000000
tFileInputFullRow: read the source file line by line.
Header: the start line of file, set the header as ((Integer)globalMap.get("tLoop_1_CURRENT_VALUE")).
Limit: set the limit as the same number of Step parameter on tLoop, let's say 1000000
tFileOutputDelimited: generate N small files which has 1000000 lines of data, set the file name with a dynamic path, for example:
"D:/work/file/test1/"+((Integer)globalMap.get("tLoop_1_CURRENT_ITERATION"))+"out.csv"
Best regards
Shong
Normally the reading a flat file should not need that much memory because there is no need to keep the datasets in the memory.
I guess there is something wrong in your job. Could you post a screenshot of your job, though we could try to spot potential memory leaks?
I agree with Jlolling... But if all else fails, you could consider using an o/s command/utility such as
split (on Linux) and cycle through the split files...
Hi, Thanks for your response. Please find below the screen shot. Basically, we try to check whether the job is already processed for the given date checking from log table. When the job is not processed we use tMap to load data from the flat file. Thanks in advance for your help.
Hi, As I already mentioned, we don't have join with other table to use tMap and set the Store temp data as true. hence we cannot use that option, please provide any other alternate solution based on the screen shot shared by eshvar Thanks, Vijay
we don't have join with other table to use tMap and set the Store temp data as true
From the job screenshot shared by eshvar, we can see there has join on tMap. It is a big job that contains many subjobs, I will suggest to debug the job step by step to see which subjob or component consumes so much memory, only remain one subjob and deactivate other subjobs, run the job and detect which subjob has the error.
Best regards
Shong
Hi Shong,
We are referring to the second tMap from the screen shot (i.e) reading the the data from the txt (flat file) format to tMap and to the respective table (Account Links) to store the data. Here, we are finding the leakage while reading the data from txt (flat file) to tMap. Hence we would like to reduce this and looking for the option to change the data to store in disk rather then memory or any other alternate option.
Thanks, Vijay
Hello
It is impossible to store the data in disk on a file component, all the data are read and stored in memory. If you are limited to increase the memory to the job execution, you can try to split the source file to several small files with a job. For example:
tLoop--iterate--tFileInputFullRow--main--tFileOutputDelimited
tLoop: do a for loop.
From:0
To: the total number of lines you have in the source file or set a big number greater than the real number of lines.
Step: the number of lines you want to have in a small file, let's say 1000000
tFileInputFullRow: read the source file line by line.
Header: the start line of file, set the header as ((Integer)globalMap.get("tLoop_1_CURRENT_VALUE")).
Limit: set the limit as the same number of Step parameter on tLoop, let's say 1000000
tFileOutputDelimited: generate N small files which has 1000000 lines of data, set the file name with a dynamic path, for example:
"D:/work/file/test1/"+((Integer)globalMap.get("tLoop_1_CURRENT_ITERATION"))+"out.csv"
Best regards
Shong
Hi Shong,
Thanks for your valuable Input, we have followed as you mentioned like "tLoop--iterate--tFileInputFullRow--main--tFileOutputDelimited" its worked well by splitting the file as I set for around 50k records for each file, which has bring down the memory size to < 50MB on each file and started splitting as multiple files, then I used tFileDelete to remove these temp or multiple files after each execution, Now i can able to run the millions of records on each file with less memory space.
Thanks with Regards,
Vijay