Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I posted this question in the Open-Source forum yesterday, but since we are using Enterprise I think this might be a more appropriate place for my post.
I've got a job that looks something like this (sorry I can't just post a screencap, but it's for work and I'm not sure if it would get me in trouble):
OracleInput1 -main-> tMap -main-> tAggregateRow -main-> tFileOutput(file_1)
| ^
| |
| lookup
| |
| DB2Input1
|
OnSubjobOk
|
| tFileOutputDelimited(duplicate_file)
| ^
| |
| duplicates
V |
OracleInput2 -main-> tMap -main-> tUniqueRow -main-> tSortRow -main-> tAggrRow -main-> tFileOutput(file_2)
| ^
| |
| lookup
| |
| DB2Input2
|
OnSubjobOk
|
|
|
|
V
tFileInputDelimited(file_1) -main-> tMap -main-> tAggregateRow -main-> OracleOutput1
|
lookup
|
tFileInputDelimited(File_2)
Some other details:
[list=*]
To run this job takes close to 6 GB of memory, and I don't understand why.
If I watch the memory usage on my machine, during the first subjob (which is processing the largest amount of data), Talend is using between 2 and 3 GB of memory. I see no issue with that.
However, once the job dumps it's data into my first flat-file and moves on to the second subjob, the data stored in memory from the first subjob does not get released. The second subjob is working with far less data than the first, but instead of memory usage dropping, it steadily increases. By the end of the 2nd subjob, I'm giving talend between 4 and 5 GBs of memory. The same thing happens when I go to subjob 3 (where I'm joining the 2 delimited files). By the time subjob 3 begins loading my Oracle table, talend is using over 6GB of memory on my machine.
I am not referencing this data at any other point in the job, and I've left the subjob. My understanding is that at this point, the data should be available for garbage collection. Subjob 1 works with the largest amount of data out of the three, so I would not expect my memory usage to get much higher than 2-3 GB (the amount that is used while I'm in subjob 1). Our senior dev thinks that this might have something to do with the tSortRow or tAggregateRow components causing Talend to not release these references and thus preventing the Garbage Collector from freeing up the memory used in the main flows of the first 2 subjobs.
Can anyone shed some light on this and help me understand what's going on behind the scenes, here?
globalMap.remove("tAggregateRow_1");
System.gc();