Re: Prerequisistes for caching - Qlik Community

krumbein · ‎2021-09-22

Hello!

I have read many times (and believed until now), that calculations in QlikView are cached and thus not repeated, when not need be.

Now I have been working on a Dashboard, which had unsatisfactory response times. We have two different time periods and three different scopes for the points of sale, that we look at: the selected POSs themselves, the regions the selected POSs belong to and the entire organization. Combine those two periods with three different scopes and you get six different values, that everything boils down to.

Those six variables are processed in different ways. There might be calculations like

NowScope1 - NowScope2
(NowScope1 / PriorScope1) - 1
RangeMin(NowScope1, NowScope2, NowScope3, PriorScope1, PriorScope2, PriorScope3)

You get the drift, but eventually it is those 6 values everything boils down to.

So far I have been using variables extensively, to build the formulas to calculate those values. I am thus pretty confident, that, after all text replacement had been done, I had a characterwise perfect equality for each occurance of those formulas and thus expected the cache to help me out.

Now I have switched to precalculating those 6 values with variables (those with an equal sign in the front) and thus basically forced caching. The app now only takes less than a third of the time to update.

On another occasion I tested a table with one expression vs the same table with that very same expression (copy & paste) in triplicate. The response times were still slower. If that can't be cached, then what can?

So my question is: what am I missing regarding the caching functionality? Am I expecting too much? Have I misunderstood something?

I have no doubt about caching when it comes to moving backwards and forwards in the selection history. That works!

Thanks!

Sandro

marcus_sommer · ‎2021-09-22

Maybe the context in which the calculations are performed is in any way different to each other. Beside the fact that already the slightest difference in writing the expression like: sum(F1) vs. Sum(F1) vs. =sum(F1) vs. = sum(F1) prevents the caching it's quite likely that the used dimensions and their order have an impact and probably further measures like the formatting, sorting and similar stuff, too. Therefore could you be sure that there is everything identically between both compared scenarios?

- Marcus

krumbein · ‎2022-02-16

Hello Marcus!

It took me a looooong time to get back to you, but I didn't want to leave this conversation completely open 🙂

Here it goes: I did a lot of testing on caching and came away with a few conlusions (and possibly even more new questions)

The caching as I described it above can not really work in my opinion. I was talking about the same formulas being used in different objects within one worksheet. Btw: I am using variables extensively and checked for any possible deviation, the formulas were identical.

But to reuse calculated values in another object, the first object (or at least those calculations) would have to be done calculating first. So this could work in two scenarios: a) if I cycle through objects in sequence, b) if the same formula is used multiple times within the same object. It won't work, if the calculation of several objects is started at the same time. It might be possible to coordinate this, but I don't have the impression that it is. When forcing a sequential calculation by spreading out objects over several worksheet there was a measurable speed up. And that speed up indeed vanished when changing the formula even in the slighest (e.g. sum -> Sum)

Regarding the caching and calculations in general, I came away with the impression that the calculation engine is still very much a black box for me, despite hics famous slides on the subject.

Examples:
Table with a certain formula on one sheet and the same on the next --> cache works, measurable speed up

Table with a certain formula on one sheet and the same table but with exactly the same formula a few more times on the next --> it doesn't get slower than the first calculation, but the cache seems to be off

The same when referencing that formula via something like Column(1) --> not slower than before, but cache is off

I also tested the a sum() formula against the same, but with avg() and count(). And then all three together. The calculation time was almost identical each time. And the bulk of that calculation time might have come from the updating of the selection state, instead of the actual calculation. Which brings me to...

And then there is influence of the update of the selection state. The overall selection state is one thing. But then combine that with the eventual selection state of the object and it gets even more complicated. It is faster to start with {1} and then add selections or to start with {$} and then remove them? It seems to depend on a lot of things, which are difficult to forecast and make sense of.

Moving all the fields into one table is also supposed to improve calculation speed, because there is no need to build temporay tables. I could observe no such thing in my tests. Strangely enough the calculation speed was almost identical again, despite the extend of changes in the data modell. That made me wonder if there isn't some kind of optimization going on (as in SQL), which smoothes over this kind of data model decision. Or it might have been, because the fact table I attached the satellite fields to was of the finest granularity anyway.

Overall you might get to some conclusion in a more clinical testing application and then to a completely different one in the actual app you are actually trying to optimize.

There is definitely no one size fits all and it feels not even a one size fits one! 😄

Regards,
Sandro

marcus_sommer · ‎2022-02-16

I think you are right and the caching-feature is a complex matter and that more official background-information about the technically implementation and possible restrictions and dependencies would be really helpful.

I couldn't really explain the behaviour but for some of your observation I have a personal deduction. From a performance point of view the most expensive task is the creation of the virtual tables which give the dimensional context of shown objects and the aggregations there. AFAIK this task is until nowadays single-threaded and takes therefore the most calculation-time. My personal assumption goes to much more as 90% and the multi-threaded calculation and the UI rendering is nearly to neglect.

This makes it quite difficult to estimate the efforts of the various parts of the object-calculation and if any of them were cached or not. AFAIK there is nothing to measure it directly. Further it indicates also that the object-dimensions and further configurations may have an impact on the caching. This may be the order of dimensions and expressions, any calculated labels, comments, visibility-conditions, the sorting and so on have an influence on the caching because they need also some calculation-times and I assume that each object has only a single virtual table in the background and not multiple ones if they extend the scope of the included data. Quite probably has also the selection-state an impact on the caching.

AFAIK the caching based on hashing the underlying parameter - not mandatory the entire object else probably also in parts - but I think (see above) that each hash includes more as just the expression and the slightest difference anywhere will lead to a different hash.

Another point which may have an impact if a caching could be applied is the order of the execution of the multiple threads and handles which are performed by each interaction. AFAIK the whole UI is multi-threaded and probably there is some kind of supervisor process which administers all the handles with the OS which queues them with own priorities. In short: some caching may not be applied because of a wrong execution-order and/or any timeouts. Maybe it's a bit far-fetched but something like this may be the cause of your observation that multiple object on one sheet behave differently to using them on multiple sheets.

- Marcus