tMap with lookup table for large Excel file

_AnonymousUser · ‎2013-07-26

Dear community,
I am reading Excel files with the tFileInputExcel, tMap and tJavaRow components. If the Excel files get too large, I get an Out of Memory Error concerning the Java Heap Space. I've read in several forums and internet pages that in this case I can use a Lookup table in my tMap component, where I can set the property "Store on disk" to true and also the path, where the input data will be saved on the disk. To make this work I added a second tFileInputExcel setting the Lookup link to the tMap, which I use to map all incoming rows to the output table. The initial tFileInputExcel with the Main link to the tMap now hasn't got any mapping columns to the output table anymore. When I start the job, I still get the Java Heap Space error. Did I forget something? Or is there an example somewhere how to design the job if I only need the Lookup table for this purpose?
Kind regards
Sylvia

Anonymous · ‎2013-07-26

Update: Now that I've created an account I add some screenshots to my problem.

alevy · ‎2013-07-27

If your problem is reading the Excel file, this won't help you. "Store temp data on disk" is to overcome the fact that otherwise Talend tries to hold the whole lookup file in memory. But except for lookups (and some other components like tAggregateRow and tSortRow) which hold all the rows in memory, mostly rows are read and processed one at a time.
You need to increase the memory allocated to your job - see https://community.talend.com/t5/Migration-Configuration-and/OutOfMemory-Exception/ta-p/21669?content....

Anonymous · ‎2013-07-29

Thank you for your answer, alevy! Increasing the memory is in our case not an option, as the maximum heap space we can allocate on our machine is 1024MB.
For this problem I openend a ticket for the Talend Support Team and they gave me instructions to use the "Store on disk" feature with two tFileInputExcel components. But I don't know how to adjust the design of my job so the "Store on disk" is used, and they wrote that it's a job design issue and I have to post it here. But they already told me that using "Store on disk" is the correct way.
I now need instructions on how to use two tFileInputExcel components, where one of them uses the Main link and the other the Lookup link to the tMap, when reading one Excel file (see screenshots attached in my second post).
Kind regards
Sylvia

alevy · ‎2013-07-29

I'm dubious about their suggestion in general but, specifically, why read the Excel file twice? I can't see how that's going to improve your situation since you're not joining the data. Try making your main flow just a tFixedFlowInput with only one row and one empty column.

Anonymous · ‎2013-07-30

I inserted the tFixedFlowInput and now the job obviously uses the feature "Store on disk", because in my temp folder, where I configured the data to be stored, I get for each job start a new .bin file. But the file size remains 0kb and I still get the Out of Memory error with my Excel file. Do I have to adjust something else? I tried to solve the problem by changing the "Max buffer size (nb of rows)" to a smaller number inside my tMap configuration, but this didn't change anything in the result. Do you have any idea what is missing?

alevy · ‎2013-07-30

Just keep reducing the "Max buffer size (nb of rows)" until it works but you really need to be able to allocate more memory to the job.

Anonymous · ‎2013-07-30

I entered "1" as a value in "Max buffer size (nb of rows)", allocated 4096MB max Heap Space to my job and I still get the Out of Memory error, because the .bin files in my temp folder don't grow after their creation. And my Excel file is only 8.2MB in size! If I remove some sheets inside the Excel file and when it has about 6-6.5MB I can read the file without problems, although the reading is very slow. And it is also not some specific sheet which causes the problem, because I tried it with different ones... could this be a bug? I use TOS version 5.0.2...

alevy · ‎2013-07-30

Doesn't sound right. Why don't you try with a later version of the studio?

Anonymous · ‎2013-10-04

I am getting the same issue with TOS 5.3.1. Similar scenario.. tMap with 8MB delimited file! The temp files doesnt grow beyond their size during creation........

Java

Talend Data Integration

v5.x