Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello
We have a problem with a job that failed because of java.lang.OutOfMemoryError: GC overhead limit exceeded.
the job is reading 13 million records from oracle to thashoutput . I would like to know what cause this error .
I know that there is xmx and xms parameter but I dont realy know how should I config those parameter.
is there any legality for changing those parameters.
Thanks
@Boof1977 wrote:
Hello
We have a problem ...I would like to know what cause this error .
the job is reading 13 million records from oracle to thashoutput
if You want use memory for speedup Job - You must have this memory available for Talend
All depend from size of this 13M records (and not forget fro other part of Job)
default value 1024m, try with 4096 or b bigger if You have this memory free on Talend Machine
or stop use tHashInput it make work slower but less memory usage
@Boof1977 wrote:
Hello
We have a problem ...I would like to know what cause this error .
the job is reading 13 million records from oracle to thashoutput
if You want use memory for speedup Job - You must have this memory available for Talend
All depend from size of this 13M records (and not forget fro other part of Job)
default value 1024m, try with 4096 or b bigger if You have this memory free on Talend Machine
or stop use tHashInput it make work slower but less memory usage
Hi
Thanks for your answer. we have 65 GB memory ram at the Talend Linux machine ,we execute the job from the studio on Linux server and we try to put
xmx 5G xms 15G but it failed.
a. how can I know how much memory the job will consume?
b. what should be the relation between xmx and xms and what exactly the purpose of those parameters?
Thanks
xms - minimum memory available for Java
xmx - maximum
You not provide enough information about Your Job:
- what data structure?
- what columns You try to load, and what really need for lookup?
- what other data flows in Your Job?
without information - it just guessing
13M * 1kb = 13Gb , 13M * 2kb = 26Gb
other question - why You try to lookup over so huge table right in Talend? may be best place for this - database.
Think about in memory lookup as point of failure,
even if You fix problem now (just put xmx - 48000M, of course You have 64 bit JDK installed) ... but what You will do when next day it will be 20M of records?
Hi
I read that "This component loads data to the cache memory to offer high-speed access, facilitating transactions involving a large amount of data"
what does it mean large amount? is there any number?
how do I know when to use DB lookup or thash? what is prefer?
is there any way to calculate the memory the job will consume?
Thanks
You never found 100% correct answer rather than - logic and try to guess 😉
- what also run of this server?
- what real free memory, not used by file cache and other processes
- what other tasks You plan run in feature at same time with this
- etc, etc ,etc
it not possible to answer without full information even on simple question - what is Your table structure?
very simple answer - SUM all columns lengths and multiply on 1.5 to be sure
13M * INT = 64Mb
13M * Long = 128Mb
13M * VARCHAR(512) = 6.5Gb (in worst case)
What prefer?
Also no single answer.
What size of main flow? what resources of database server? what must be final result?
Normally:
Hash - for small number of lookup data, which must be used more than in 1 tMap or lookup
even if You use lookup from database - Talend also will load all data into memory, in this case better to load data to database and make Lookup by SQL.
Hi
Regarading your first questions in the future it will be hard to know because the consumption of the resources are something that dynamically change.
the only thing I can tell you for now is that its a new server that nothing is running on it .
can you please explain again the calculation of
13M * INT = 64Mb
13M * Long = 128Mb
13M * VARCHAR(512) = 6.5Gb (in worst case)
for example if I have a table with 5 columns and 13 million record :
column 1 - number
column 2 - varchar(20)
column 3 - long
column 4 - varchar(20)
column 5 - Bigdecimal
a. what should be the result of the calculation?
b. what should I put at the xmx and xms and what is the relation between of them.
c. what is the different between the memory utilization of using DB lookup and thash?
Thanks
number - 20bytes
varchar(20) - for UTF-8 2x2x20 = 80bytes, could be more
big decimal - 4bytes
long - 8bytes
total 20+80+4+8=112bytes per row == 1.5Gb,
but this is only for Hash, and if talend not add something, so better to take approximate 2Gb, and You have other part of Job
if server empty - start from biggest and reduce till not stop work
@Boof1977 wrote:
Hi
c. what is the different between the memory utilization of using DB lookup and thash?
DB Lookup in Talend will load data every time when You se this table for Lookup. From this point of view Hash look more effective - load once, use in 10 tMap after. But I mean operation in database - if source and target on same server, filter first by query, than load to Talend. If different - load all to target or staging server and then filter by SQL queries again. Databases (most of them) - designed for work with data many times bigger than memory.
hi
does the length of the field at the "edit schema" represent the number of bytes?
what is the relation between xmx and xms?