Solved: Re: For the hardcore techies.. Qlik's memory refer... - Qlik Community

vkish16161 · ‎2017-06-12

All,

I have a question for you:

Imagine that there are 2 calculations namely X and Y which are very complex in nature. Also imagine that their Set Analysis flags come from the farthest table from the fact table and Qlik's logic has to pass through multiple 'hoops' before evaluating the filter.

Now..

Imagine that there are 2 charts with the same dimension but in 1 chart the measure is of the form (X-Y) / X and in another chart, the measure is (1-Y/X) . Mathematically both equations are the same but how will Qlik evaluate this? Will Qlik 'think' these are 2 different expressions and render them separately? Or does it happen as one?

Also, In the first chart, Am I correct in assuming that X is evaluated twice and Y is evaluated once.

while in the 2nd chart, X and Y are evaluated once.

Which calculation will render faster or will they be the same at least theoretically?

Would variablizing the calculation help?

Also, remember that now the calculation engine has changed from QlikView Engine to QIX Engine (Column based). So will this make any difference?

Thanks!

rwunderlich‌hic‌robert_mika‌

marcus_sommer · ‎2017-06-21

As I mentioned before it's not very easy to measure the performance of different ways to do a thing in QlikView. But it could be done by using the mem-files like described in the above link and some smaller checks could be even done without this just by using the document properties within the tab sheets - the column time will show the calculation time from the first opening. Important by this is to disable the qlik caching or to close the application each time and opening them again.

Therefore it will be easier in your case just to increase the amount of data until you see a real difference between the different calculations - and you should use a pivot for it with multiple dimensions to give qlik something to calculate.

Your second point isn't really clear for me - maybe you could get your wanted results rather with a getselectedcount() or a count(distinct FIELD) and one or two if-loops. Of course if-loops should be avoided if possible especially if they are calculated multiple times within charts but by a single calculations within a textbox you could be a bit pragmatic and just use the if-loop for simplicity instead of spending much efforts to implement a different solution.

- Marcus

View solution in original post

ahaahaaha · ‎2017-06-13

Hi,

There is a proposal. I in such cases generate some data set (several hundred thousand or even millions of lines) and run it in two calculations. The calculation time for accuracy should take 10-30 seconds. Adjust it by increasing / decreasing the amount of data created.

In this way, one can, in practice, quantitative measurement obtain without any theoretical studies and assumptions.

Regards,

Andrey

marcus_sommer · ‎2017-06-13

Both expressions are different from a qlik point of view and won't use a common cache. AFAIK the expression-statements will be hashed and therefore will even a single space or an equal-sign lead to a different hash. Personally I would assume that (1-Y/X) would be faster as (X-Y) / X beacuse in the latter X will be calculated twice and will probably need more resources as 1.

I agree with the suggestion from Andrey that's rather seldom useful to theorize the matter instead of just trying it out. Beside playing with different amounts of data you could measure their calculation-times with the mem-files: Recipe for a Memory Statistics analysis.

Further I would consider to change the datamodel and to include the flag within the fact-table. This might be against the rules of any dimensional data-modelling and lead to other disadvantages like longer script run-times or more RAM consumption - but you would improve your GUI performance. In the end you will need to optimize it for your biggest bottleneck whatever this might be in your environment.

- Marcus

vkish16161 · ‎2017-06-13

Thanks for the reply Andrey. You're right. It's better to try it out with actual data than theorize. I actually did. Their calculation times didn't differ much. But this could be due to several other factors too

vkish16161 · ‎2017-06-13

Marcus,

I'm not sure if you're right about X being calculated twice in (X-Y)/X. For argument sake, let's assume that I have variablized X and Y. Then why would X be calculated again. This doesn't make sense. Is there any mention of this sort in the Reference manual?

(Even if X were to be hashed, since X is the exact same expression, their hashings would yield same values.)

Thanks for showing interest in answering this question. We had a Solutions Architect come from Qlik and I asked him this question. He wasn't able to help. Your knowledge will be very helpful to solve this issue.

I'll be tagging Rob and HIC to see what their point of view is.

rwunderlich‌ hic‌

rwunderlich · ‎2017-06-13

I agree with marcus_sommer‌ conclusion that cache equivalence is determined based on the entire expression. Therefore the two expressions would be considered as different calculations, to be calculated independently.

Re putting X in a variable. This would only save calculation time if X were defined with a leading "=", effectively creating a constant. e.g.

SET X = '=Sum(Sales)';

X would therefore be calculated once over all the data and the resolved value of X would be substituted in the expression. e.g

If X = 1234:

($(X)-Y)/$(X) = (1234-Y)/1234

On the other hand, if X were defined without the "=" as is the case when we need to make the calculation on a dimensional level.

SET X = 'Sum(Sales)';

The expression: ($(X)-Y)/$(X) would be expanded as:

(Sum(Sales)-Y)/Sum(Sales)

which is fundamentaly the same as writing the expression directly without using the variable. AKAIK, QV is not clever enough to reuse the first "Sum(Sales)" fragment as the result of the second "Sum(Sales)" fragment. I may be wrong here, I'm not sure.

-Rob

http://masterssummit.com

http://qlikviewcookbook.com

vkish16161 · ‎2017-06-14

Rob,

Thanks for the elaborate answer. It's very helpful.

So as per your suggestion keeping a leading '=' will help improve the performance slightly? (During memory referencing)

So I could ask my team to simply put an '=' in every variable. How will this affect calculations? Would it yield any erroneous values? (Sorry for asking this dumb question)

I've asked few more MVP's this question and they all said to me that the calculation times wouldn't change and they mentioned that Qlik will pull both X and Y from memory. I guess I'll share them this post so it'll be useful for them.

I'll also be tagging Henric to see what his viewpoint is before I close this post.

I'm hoping he could respond on this

Once again thanks for sharing your knowledge.

hic‌

marcus_sommer · ‎2017-06-14

The equal-sign won't speed up an expression but it will by variables made a difference where respectively in which context they are calculated, see here: The Little Equals Sign. This meant you couldn't just add or remove it - it will be always depend on the concrete situation. And this is what Rob has had in mind - if a calculation could be made globally on the outside you would save performance if you just pulls this result instead of calculating it multiple times within the chart.

In your case X and Y won't be cached (it's also my understanding how the caching worked that only whole expressions are cached and not expression-parts) unless you have X and Y calculated in further columns and refer on them per expression-label or per column().

- Marcus

vkish16161 · ‎2017-06-15

Marcus, thanks for sharing your valuable input -

I think equal sign before variable is improving calculation time because I have tested out with the following:

We have expressions with an if block like this, this is repeating multiple times in different expressions:

if(GetSelectedCount([Team Name])=0 or GetSelectedCount([Team Name])=2

,

if((GetSelectedCount([Region Name])=0 and GetSelectedCount([Area Name])=0 and GetSelectedCount([Decile Level Region])=0 and GetSelectedCount([Decile Level Area])=0),

Expr1, Expr2)

In the load script, I have defined it as:

set vget0 = '=GetSelectedCount([Team Name])=0 or GetSelectedCount([Team Name])=2';

set vget1 = '=(GetSelectedCount([Region Name])=0 and GetSelectedCount([Area Name])=0 and GetSelectedCount([Decile Level Region])=0 and GetSelectedCount([Decile Level Area])=0)';

Now simply I have replaced it with If($(vget(0) ,$(vget1), Expr1, Expr2). The calculation seems to have improved a bit.

Also, I have put the variables in a text box and have seen that they're giving -1 ( Boolean True value) and 0 (Boolean False Value).

rwunderlich‌

Rob, On a side note, I can't seem to find your book on Amazon . Is there a portal to purchase a paperback copy?

Also, does your book cover boolean operations to use as flags in set analysis? Does it cover any shortcomings on this procedure? I know HIC made a blogpost on using booleans as flags in Set analysis but that was 3 years and then we had QlikView Engine. On the newer QIX engine will it yield any significant improvement?

hic (I will be leaving the post open to see if Henric could comment on this. I'm really wishing he does )

rwunderlich · ‎2017-06-15

You can calculate those if() statements using the "=" in the variable because it's logical that they are calculated at the global level, they are not dimension dependent. So this is a great solution, glad you are seeing a performance improvement.

I didn't write a book. That was Stephen Redmond stephen-x.redmond who wrote "QlikView for Developers Cookbook" which is available on Amazon. I publish the QlikView Cookbook website, which is not actually a book.

Redmond's "Mastering Qlikview" also has some good performance tips.

I don't know of any shortcomings with using boolean flags in Set Analysis. Are you asking in comparison to another technique?

-Rob

http://masterssummit.com

http://qlikviewcookbook.com

For the hardcore techies.. Qlik's memory referencing