Large Data; JOIN vs. LINK

Report Inappropriate Content · ‎2010-12-01

Hello,

I am working with a large dataset (600 million rows).
Now I am adding Information ... and I have a general question:

I have data which belongs only to a part of this 600 mio rows (1.2 mio.)
Would say it is better to JOIN the data, or better link it.

So far I would opt for LINK due to memory consumption.
Do you agree?

Thanks for any information,
Thilo

pover · ‎2010-12-01

With a join memory consumption will be greater. There is space taken up to have 599 million rows with a null value, so if memory is a concern do the link. However the link might cause the graphics to generate a little slower since QlikView seems to be faster when all the data is in one table so both solutions merit testing.

Regards.

Report Inappropriate Content · ‎2010-12-02

Hello Karl,

thanks for the reply.
Yes, memory is an issue, as the pure model currently takes ca. 15 GB RAM. Although the server machine is well equipped, we want to leave a few bytes for Calculations and such stuff 😉

We linked the table now, and yes, we start getting performance issues.
Our current model is, that we have that laaarge table, and a number of smaller linked tables (which in some cases have also linked tables).
'Smaller tables' means, that they still have 1 mio+ rows.
=> I.e. we have kind of a snowflake.

Any recommendations on what to look at in order to keep the user experience smooth?

Thank you,
Thilo

hector · ‎2010-12-02

Hi

i guess mapping/applymap is out of the table, maybe you can give it a shot and try it

rgds

pover · ‎2010-12-13

Thilo,

There's no easy answer to this. You should attack the problem on multiple sides. Here are your options in no particular order.

First, check if your hardware is configured to perform at its best. You might want to check out this post:

http://community.qlik.com/forums/t/33509.aspx

Second, think about making your expression more efficient. Over usage of set analysis can slow down expression. An sum(if(match())) when comparing string values may be a better option, but there doesn't seem to be a universal solution to this so you're going to have to play around.

Third, eliminate all unused columns. There are applications floating around that can automatically detect unused columns in an application.

Fourth, try aggregating rows to decrease the number of rows in a table although doing a group by over 600 million rows is going to take a long time and a lot of memory.

Fifth, split the data by months or years to make the files smaller and link the files with actions that makes it feel like one document.

Regards.

QlikView App Dev