Skip to main content
Announcements
Introducing a new Enhanced File Management feature in Qlik Cloud! GET THE DETAILS!
cancel
Showing results for 
Search instead for 
Did you mean: 
RA6
Creator
Creator

java.lang.OutOfMemoryError: GC overhead limit exceeded

Hello guys,

 

I have a csv file that use only 4 columns but the volume is quite big - 21 millions lines.

The first two column are the identifier.

 

ID1 ID2 INFO1 INFO2
AA 12 Pop 550
AA 12 Tim 600
AA 12 Luck 720
AA 12 Tom 950
AA 12 Nina 450
BB 23 Duke 932
BB 23 Rod 72
BB 23 Yub 560
BB 23 Anna 432
BB 23 Paul 453

 

All i want to do the to group all the Info1 and Info2 with its corresponding identifier and it works.

But the real problem is the amount of data.

 

0683p000009M6Vz.png

 

0683p000009M6LV.png

 

0683p000009M6Yn.png

 

Local Machine 8Gb RAM

I have increase the JVM - Xmx to 4096 M and Xms to 2048 M

 

Do you have any idea how it can be done or optimise please?

Thank you.

 

Best regards,

asadasing

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Change the "Buffer size of external sort" to something like the number of rows you are working with. It is set to 1000000 by default. Maybe change this to 25000000.

View solution in original post

3 Replies
Anonymous
Not applicable

I'll be honest and say it doesn't look like you are doing anything immensely hard there. I believe the problem is entirely down to the last flow that runs. Have you tried running that on its own? I suspect it will fail, but can you test it? The next thing to try is to add a tSortRow after the tMap and then a tAggregateSortedRow after that to carry out the list operation. This *might* help. It will break the act of sorting and aggregating down into two tasks instead of one. You can use the "Sort on disk" option of the tSortRow (Advanced settings). This will remove the sorting from memory and should allow you to get through this.

RA6
Creator
Creator
Author

Hello @rhall

Thank you for your response.

As suggested, i have tried running the subjob on its own but it failed on 7 millions rows (same as before).

 

I have also tried the option of adding the tsortrw (stored on disk) and taggregatesortedrow, but it stopped at 5 millions approx with OutOfMemory error.

 

0683p000009M7P0.png

 

Is there any other way to solve this out?

 

Thank you.

 

asadasing

Anonymous
Not applicable

Change the "Buffer size of external sort" to something like the number of rows you are working with. It is set to 1000000 by default. Maybe change this to 25000000.