Solved: OutOfMemoryError: GC overhead limit exceeded - Qlik Community

Boof1977 · ‎2017-05-23

Hello

We have a problem with a job that failed because of java.lang.OutOfMemoryError: GC overhead limit exceeded.

the job is reading 13 million records from oracle to thashoutput . I would like to know what cause this error .

I know that there is xmx and xms parameter but I dont realy know how should I config those parameter.

is there any legality for changing those parameters.

Thanks

vapukov · ‎2017-05-23

@Boof1977 wrote:

Hello

We have a problem ...I would like to know what cause this error .

the job is reading 13 million records from oracle to thashoutput

if You want use memory for speedup Job - You must have this memory available for Talend

All depend from size of this 13M records (and not forget fro other part of Job)

default value 1024m, try with 4096 or b bigger if You have this memory free on Talend Machine

or stop use tHashInput it make work slower but less memory usage

View solution in original post

vapukov · ‎2017-05-23

@Boof1977 wrote:

Hello

We have a problem ...I would like to know what cause this error .

the job is reading 13 million records from oracle to thashoutput

if You want use memory for speedup Job - You must have this memory available for Talend

All depend from size of this 13M records (and not forget fro other part of Job)

default value 1024m, try with 4096 or b bigger if You have this memory free on Talend Machine

or stop use tHashInput it make work slower but less memory usage

Boof1977 · ‎2017-05-24

Hi

Thanks for your answer. we have 65 GB memory ram at the Talend Linux machine ,we execute the job from the studio on Linux server and we try to put

xmx 5G xms 15G but it failed.

a. how can I know how much memory the job will consume?

b. what should be the relation between xmx and xms and what exactly the purpose of those parameters?

Thanks

vapukov · ‎2017-05-24

xms - minimum memory available for Java

xmx - maximum

You not provide enough information about Your Job:

- what data structure?

- what columns You try to load, and what really need for lookup?

- what other data flows in Your Job?

without information - it just guessing

13M * 1kb = 13Gb , 13M * 2kb = 26Gb

other question - why You try to lookup over so huge table right in Talend? may be best place for this - database.

Think about in memory lookup as point of failure,

even if You fix problem now (just put xmx - 48000M, of course You have 64 bit JDK installed) ... but what You will do when next day it will be 20M of records?

Boof1977 · ‎2017-05-24

Hi

I read that "This component loads data to the cache memory to offer high-speed access, facilitating transactions involving a large amount of data"

what does it mean large amount? is there any number?

how do I know when to use DB lookup or thash? what is prefer?

is there any way to calculate the memory the job will consume?

Thanks

vapukov · ‎2017-05-24

You never found 100% correct answer rather than - logic and try to guess 😉

- what also run of this server?

- what real free memory, not used by file cache and other processes

- what other tasks You plan run in feature at same time with this

- etc, etc ,etc

it not possible to answer without full information even on simple question - what is Your table structure?

very simple answer - SUM all columns lengths and multiply on 1.5 to be sure

13M * INT = 64Mb

13M * Long = 128Mb

13M * VARCHAR(512) = 6.5Gb (in worst case)

What prefer?

Also no single answer.

What size of main flow? what resources of database server? what must be final result?

Normally:

Hash - for small number of lookup data, which must be used more than in 1 tMap or lookup

even if You use lookup from database - Talend also will load all data into memory, in this case better to load data to database and make Lookup by SQL.

Boof1977 · ‎2017-05-24

Hi

Regarading your first questions in the future it will be hard to know because the consumption of the resources are something that dynamically change.

the only thing I can tell you for now is that its a new server that nothing is running on it .

can you please explain again the calculation of

13M * INT = 64Mb

13M * Long = 128Mb

13M * VARCHAR(512) = 6.5Gb (in worst case)

for example if I have a table with 5 columns and 13 million record :

column 1 - number

column 2 - varchar(20)

column 3 - long

column 4 - varchar(20)

column 5 - Bigdecimal

a. what should be the result of the calculation?

b. what should I put at the xmx and xms and what is the relation between of them.

c. what is the different between the memory utilization of using DB lookup and thash?

Thanks

vapukov · ‎2017-05-24

number - 20bytes

varchar(20) - for UTF-8 2x2x20 = 80bytes, could be more

big decimal - 4bytes

long - 8bytes

total 20+80+4+8=112bytes per row == 1.5Gb,

but this is only for Hash, and if talend not add something, so better to take approximate 2Gb, and You have other part of Job

if server empty - start from biggest and reduce till not stop work

vapukov · ‎2017-05-24

@Boof1977 wrote:

Hi

c. what is the different between the memory utilization of using DB lookup and thash?

DB Lookup in Talend will load data every time when You se this table for Lookup. From this point of view Hash look more effective - load once, use in 10 tMap after. But I mean operation in database - if source and target on same server, filter first by query, than load to Talend. If different - load all to target or staging server and then filter by SQL queries again. Databases (most of them) - designed for work with data many times bigger than memory.

Boof1977 · ‎2017-05-24

hi

does the length of the field at the "edit schema" represent the number of bytes?

what is the relation between xmx and xms?

OutOfMemoryError: GC overhead limit exceeded

Java

Talend Data Integration

v6.x

OutOfMemoryError: GC overhead limit exceeded

Java

Talend Data Integration

v6.x

Related Topics