Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
gmenoutis
Partner - Creator II
Partner - Creator II

Set analysis vs precalculated column

I would like to know which is more performant to calculate an expression on a certain data subset:

  • Using set analysis \ set modifier to select the desired subset
  • Create a new, dedicated field in the data warehouse that has the data we need for the subset records and zero/null for the others.

Dummy example: Calculate the SUM of all [Units] sold which have [Color]=Green:

1) SUM({$<[Color]={Green}>} [Units]

2) On script load, have an additional field [Green Units] calculated as if([Color]=Green,[Units],0)

then the expression is simply SUM([Green Units])

Which is faster and/or lighter on resources?

13 Replies
gmenoutis
Partner - Creator II
Partner - Creator II
Author

I understand your points.

After all is said and done, it's a classic balance (and maybe a little gamble) matter on memory vs cpu right?

(With maximum response time in mind and not caring too much about loading time):

If I am sure memory will not fail me, my best bet will be to calculate on-script additional columns and directly reference them on my expressions. Going one step further, I could even join small tables to large ones and completely eradicate the small ones...again, if memory is enough.

But if it is NOT enough, then I should stick to the more normalized data warehouse form and set analysis. That will need the cpu to be making the calculations on the spot.

Right?

Miguel_Angel_Baeyens

Right, calculating additional columns and joining when possible is recommended if resources permit. The most expensive part here is the time it takes to JOIN, once it is done, the footprint in memory will not be much higher. But the JOIN operation is what is resource consuming, not the result.

The more tables you have which are part of the calculation, yes, it will require more CPU. Another word of caution here, some parts of the calculation are single threaded, so having more CPUs or more powerful ones will not guarantee a better result.

This is a very nice overview: The Calculation Engine

gmenoutis
Partner - Creator II
Partner - Creator II
Author

This is very intersting. I'd expect memory to be an issue - does this mean I should try to make a huge monolith table joining all the others in it then?

Also, the provided link and its other linked topics are very good reference - thanks.

Miguel_Angel_Baeyens

If possible yes, the principle is that the lesser the tables the faster the calculation. I think I never achieved it, though.

And yes, the Qlik Design Blog has very good views of the inners of Qlik.