Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Learn how to migrate to Qlik Cloud Analytics™: On-Demand Briefing!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Join large tables

Hi all,

I'm trying to join two large table from qvd files (each one has 40 millions of records) because I need to filter on two fields which are in the two different tables. Every time I try that QlikView starts loading the data, it waits a long time and at the end it simply says that it failed to load the data. I' running the software on a server with 4 cores and 10 GB of ram and during the loading I see the all the memory is used. When I try to run the same script in debug mode with a limited load of, for example, 10K records everything works well then there are no errors in the script. How can I solve this issue? Thanks a lot for your help.

13 Replies
Not applicable
Author

Can you add more memory?

Not applicable
Author

Yes I could, but I thought I could find another solution, maybe by changing some QlikView settings, by using some "hidden" optimizations in the script, before buying new hardware. 10 GB of ram is a huge quantity and I thought they were enought to manage my data. What is your experience with the ram consuming by QlikView? Is there another solution I can use to solve my issue? Thanks again.

boorgura
Specialist
Specialist

Is it because there are synthetic keys formed?

Try using a link table in that case.

johnw
Champion III
Champion III


b wrote:Yes I could, but I thought I could find another solution, maybe by changing some QlikView settings, by using some "hidden" optimizations in the script, before buying new hardware. 10 GB of ram is a huge quantity and I thought they were enought to manage my data. What is your experience with the ram consuming by QlikView? Is there another solution I can use to solve my issue? Thanks again.


First, how wide are the tables themselves? How compressible is the data? If each table held just 125 bytes of compressed information, that would require 10 GB of RAM right there. I suspect your tables would be smaller than that when compressed, but I don't know your data, so can't say for sure. Load each one separately. How much RAM is consumed by each separately? Add them up, and if it's over 10 GB, that's your problem. If that IS your problem, and you need all that data, then adding memory is the only solution.

But if, for instance, the first table takes 2 GB and the second takes 1 GB, then that isn't your problem. In that case, I suspect that something is going wrong with your join. For these two tables to only take 3 GB when joined, you would need to be doing a one to one join on a unique key shared by each table. If you're joining, say, ten different rows from table two to each row in table one, then you're going to run out of RAM. If you didn't intend to do a join like that, then changing it back to a 1:1 join should solve the problem. If you DO need to do a join like than, then adding memory is again the only solution.

Mind you, if it IS 1:1 with each table having the same unique ID, why are they in separate tables in the first place? (Not saying it's necessarily wrong - one of our source systems often has multiple actual tables all keyed by the same ID for various performance and memory reasons. I'll typically join them when building QVDs rather than when loading QVDs, but that doesn't solve your problem, just moves it to a different script.)

Not applicable
Author

There are no synthetic keys.

Not applicable
Author

Thanks for your answer!

If I load the two tables together without a join between them the present quantity of ram results to be enough. Here is the schema of my script:


qualify *;
unqualify A_field;
LOAD A_field, B_field FROM X_table.qvd (qvd);
join LOAD A_field, C_field FROM Y_table.qvd (qvd);


Then I expect the join is made on A_field. Is that right or am I making some mistakes? The tables come from an existing database which can't be changed. Thanks again for all your help.

Not applicable
Author

Any other ideas welcomed! Thank you!

boorgura
Specialist
Specialist

Do you have a lot of number fields??

I have an idea from one of the Qliktech consultants to purge the trailing zeros after the decimal, which reduces atleast some of the memory consumption.

Not applicable
Author

Thanks for your answer, but unfortunately the original data can't be modified. I have to find another solution.