Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

OutOfmemory with big Lookup-Table

I work with a 6 million row table (Oracle), where i need to update only parts of a column (parts of string will be replaced).
Using the table as a lookup table input (tMap) the job quits after roughly 1 Mio read entries (the first step, which is executed)
Does anybody has a workaround for that ?
cheers, Benjamin
(working on a 2GB windows XP-machine)
Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space
at java.nio.CharBuffer.wrap(Unknown Source)
at sun.nio.cs.StreamEncoder.implWrite(Unknown Source)
at sun.nio.cs.StreamEncoder.write(Unknown Source)
at java.io.OutputStreamWriter.write(Unknown Source)
at java.io.BufferedWriter.flushBuffer(Unknown Source)
at java.io.BufferedWriter.flush(Unknown Source)
at java.io.PrintWriter.newLine(Unknown Source)
at java.io.PrintWriter.println(Unknown Source)
at java.io.PrintWriter.println(Unknown Source)
at routines.system.RunStat.sendMessages(RunStat.java:131)
at routines.system.RunStat.run(RunStat.java:104)
at java.lang.Thread.run(Unknown Source)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Unknown Source)
at java.lang.String.<init>(Unknown Source)
at ......
Labels (3)
20 Replies
Anonymous
Not applicable
Author

Versions prior to the latest (2.4) read the lookups into memory. So, what I tried to do was to limit the number of columns retrieved from the lookup tables to just what I needed. Just used a select statement in the tXXXInput to select only the columns I was going to use for the lookup. It usually worked - but, with large tables, even then it would crash.
Version 2.4 has an option on the tMap to use a disk file to store the lookups. So, I would say that a combination of both methods should solve your problem efficiently.
Anonymous
Not applicable
Author

Hi SMaz
Thank you for your suggestions. I am happy to hear that 2.4 has a "file" feature in tMap. My company actually has started to work with JasperETL 2.3.x - So I hope to soon be able to use 2.4.
While browsing I found a lot of members having trouble with big lookup tables. Although the ZIP-Code Lookup Examples always work well...
@Talend-Team
My suggestion: Why not having *one* select statement to txxxInput Component on *each* main data row, which requires a join. let Oracle or other db's do the caching or whatever appropriate. Why building again hash-tables and files ?
Would that be a difficult code-change to a tJoin or txxxInput (lookup) component ?
Any discussion or comments are welcom!
cheers, Benjamin
Anonymous
Not applicable
Author

@ Talend- Team
Could you please explain what is meant with: "New high performance lookup mode supporting multi-gigabyte tables and files " as stated in recent newsletter apraising 2.4?
Is it the "swap to file" option in tOracleInput ? or anything else really high performing ?
cheers
Benjamin
amaumont
Contributor III
Contributor III

My suggestion: Why not having *one* select statement to txxxInput Component on *each* main data row, which requires a join. let Oracle or other db's do the caching or whatever appropriate. Why building again hash-tables and files ?
Would that be a difficult code-change to a tJoin or txxxInput (lookup) component ?

We are agree with you, it can be an excellent behavior in a Database context.
We will work about this subject for the 2.5 release.
Yet, when big files are used without available database, user can't use a cache db to process its data, so the tMap's "Store on disk" option will remain useful in this case.
Could you please explain what is meant with: "New high performance lookup mode supporting multi-gigabyte tables and files " as stated in recent newsletter apraising 2.4?
Is it the "swap to file" option in tOracleInput ? or anything else really high performing ?

This information speak about the the tMap's "Store on disk" option. "high performance lookup" should be relativized, usually it will be slower than memory management, yet any benchmarks with specific parameters will be able to have same performance with and without "Store on disk" enabled.
I will post a screenshot of a benchmark set with default parameters and a big number of rows soon.
Anonymous
Not applicable
Author

Hi amaumont
Thank yo for the reply on my questions! I think the Talend Team is on the right track!
Actually we only have some troubles to get a grip on your sales staff.
Cheers
Benjamin
Anonymous
Not applicable
Author

Hi
I installed Release 2.4.0 and tried the new "Store to disk" option.
(btw: this option is in the mapping editor as a table option of the lookup table)
I got a compilation error when using the Lookup Table Row in a Var Expression. the "rowX could not be found"
it runs, but only a bit longer (then "memory only") if I just pass the row from lookup to the output
The Row Chunk I use is 1'000'000 which is closed to memory overflow row count. Normal Overflow occurs at 1.4Mio retrieved lookup rows. with file save option it stops at 2.4 Mio rows
btw: the temp files are not cleaned up. sums up quickly to Gig's
Anonymous
Not applicable
Author

Continued test:
Row Chunk reduced to 500'000 worked , but later, in the middle of treating rows: again Memory problems, I guess there might be some bigger leaks or just inefficiency?
amaumont
Contributor III
Contributor III

The problem come from a bug in the join algorithm, it read a (very big) bad value for a length of an array and tries to allocate it.
This problem does not depend on the amount of data but a specific sequence of data to join. A small number of data can implies this problem.
It remains us to find which it...
Anonymous
Not applicable
Author

Hi amaumont
might it be helpfull, if I send you the test data ? it is a bunch of csv as input, as well a big oracle table, which I can send you as a csv file. Do you have any ftp for such purpose ?
Cheers, Benjamin