Handling Large Lookup in tMap(10 Million+)

Anonymous · ‎2014-08-25

Hi,
We have Talend job (Loading Fact Table in DW) in which we use more than 4 lookup tables. Among those lookup tables only one table has large amount of data (10 Million). Whenever the Job is executed we get error as "Out of Heap Memory".
RUN 1 :
As suggested in Talend Help Site i have tried to Increase the JVM parameters. Even after increasing the Job am unable to execute the Job.
JVM Parameters:
Xms256M
Xmx1610M
Source - SQL Server
Lookup/ Target - Oracle.
In Each of the lookup table we have enabled CURSOR.
RUN 2:
Also tried to load with lookup data stored in the Local Directory by enabling the Store Temp Data directory in tMap.
The problem with this method is We are unable to Load all the data from source to Target. For Example : If the Source has 10 Millions records am able to Load only Half Million record into Target (Meaning lookup fails for not processed records).
Also the time taken is more to process.
Please Note:
RAM Speed - 4GB
In both these we were unsuccessful, is there any way in talend to Handle the lookup effectively.??
If so please let us know..!! Inputs would be helpful..

Below i have also attached my Job Screen Shot:

Anonymous · ‎2014-08-27

Hi sanvaibhav,

Java Version which we are using is 1.6.0_35..
Yeah i did try with 1610M, but same Result.
I didnt try breaking the job, i will give a try on that.
Is any thing that can be done to increase the Java Xmx Size.?

Hi kzone,
Now am trying that. To store the lookup data in Disk
I do have a doubt on that,
As i have more that 4 lookup tables, should i store all the lookup data in the Disk or the One which has more lookup data.?

Thanks
Arul

Anonymous · ‎2014-08-27

If your job fails again after using store on disk option, think of upgrading java version if you have no issues. Also disable second tMap and execute first tmap and lookup. Store all lookup data to disk and try again.

Anonymous · ‎2014-08-27

Hi,
After using the Store on Disk option in tMap the Job is not failing whereas it is not processing all the source records into target.
My source - 8Million record
Expected O/P- 8Million Record
Actual O/P - 3 Million Record
Thanks
Arul

Anonymous · ‎2014-08-27

Disable that store on disk option and check again...

Anonymous · ‎2014-08-27

I didnt try breaking the job, i will give a try on that

could be a better solution

If it's possible join your data in several tmap & release memory between them (a temp stored in file for exemple as I/O in file is the fatest way)
Could be
1. join with big data storing on disk or reload each time - empiric testing

& store result in flat file
2 read result and join other data
ps: solution by increasing jvm heap size is not a perennial solution as if data volume increase , it failed again (in production) .. so it could become a none solution as you cannot increase jvm indefinitely.
Could be a solution for stable volume of data.
hope it help
regards

Anonymous · ‎2014-08-28

Hi Guys,
Many Thanks for your Time and Inputs..
I was able to run after breaking Single Job into 3 separate jobs.
Thanks
Arul

Anonymous · ‎2014-08-28

That's great....
How you managed to do that with tMap in addition to splitting the job? Have you identified performance improvement with temp storage on disk space as well as in memory for tMap settings?
Thanks
Vaibhav

Talend Data Integration

v5.x