17 Replies Latest reply: Jun 29, 2017 11:35 AM by Vishnu Chakrakishore RSS

    For the hardcore techies.. Qlik's memory referencing

    Vishnu Chakrakishore

      All,

       

      I have a question for you:

       

      Imagine that there are 2 calculations namely X and Y which are very complex in nature. Also imagine that their Set Analysis flags come from the farthest table from the fact table and Qlik's logic has to pass through multiple 'hoops' before evaluating the filter.

       

      Now..

       

      Imagine that there are 2 charts with the same dimension but in 1 chart the measure is of the form (X-Y) / X  and  in another chart, the measure is (1-Y/X) . Mathematically both equations are the same but how will Qlik evaluate this? Will Qlik 'think' these are 2 different expressions and render them separately? Or does it happen as one?

       

      Also, In the first chart, Am I correct in assuming that X is evaluated twice and Y is evaluated once.

      while in the 2nd chart, X and Y are evaluated once.

       

      Which calculation will render faster or will they be the same at least theoretically?

       

      Would variablizing the calculation help?

       

      Also, remember that now the calculation engine has changed from QlikView Engine to QIX Engine (Column based). So will this make any difference?

       

      Thanks!

       

      rwunderlichhicrobert_mika

        • Re: For the hardcore techies.. Qlik's memory referencing
          Andrey Khoronenko

          Hi,

           

          There is a proposal. I in such cases generate some data set (several hundred thousand or even millions of lines) and run it in two calculations. The calculation time for accuracy should take 10-30 seconds. Adjust it by increasing / decreasing the amount of data created.

          In this way, one can, in practice, quantitative measurement obtain without any theoretical studies and assumptions.

           

          Regards,

          Andrey

          • Re: For the hardcore techies.. Qlik's memory referencing
            Marcus Sommer

            Both expressions are different from a qlik point of view and won't use a common cache. AFAIK the expression-statements will be hashed and therefore will even a single space or an equal-sign lead to a different hash. Personally I would assume that (1-Y/X) would be faster as (X-Y) / X beacuse in the latter X will be calculated twice and will probably need more resources as 1.

             

            I agree with the suggestion from Andrey that's rather seldom useful to theorize the matter instead of just trying it out. Beside playing with different amounts of data you could measure their calculation-times with the mem-files: Recipe for a Memory Statistics analysis.

             

            Further I would consider to change the datamodel and to include the flag within the fact-table. This might be against the rules of any dimensional data-modelling and lead to other disadvantages like longer script run-times or more RAM consumption - but you would improve your GUI performance. In the end you will need to optimize it for your biggest bottleneck whatever this might be in your environment.

             

            - Marcus

              • Re: For the hardcore techies.. Qlik's memory referencing
                Vishnu Chakrakishore

                Marcus,

                 

                I'm not sure if you're right about X being calculated twice in (X-Y)/X. For argument sake, let's assume that I have variablized X and Y. Then why would X be calculated again. This doesn't make sense. Is there any mention of this sort in the Reference manual?

                 

                (Even if X were to be hashed, since X is the exact same expression, their hashings would yield same values.)

                 

                Thanks for showing interest in answering this question. We had a Solutions Architect come from Qlik and I asked him this question. He wasn't able to help. Your knowledge will be very helpful to solve this issue.

                 

                I'll be tagging Rob and HIC to see what their point of view is.

                 

                rwunderlich hic

                  • Re: For the hardcore techies.. Qlik's memory referencing
                    Rob Wunderlich

                    I agree with marcus_sommer conclusion that cache equivalence is determined based on the entire expression. Therefore the two expressions would be considered as different calculations, to be calculated independently.

                     

                    Re putting X in a variable. This would only save calculation time if X were defined with a leading "=", effectively creating a constant. e.g.

                    SET X = '=Sum(Sales)';

                     

                    X would therefore be calculated once over all the data and the resolved value of X would be substituted in the expression. e.g

                    If X = 1234:

                     

                    ($(X)-Y)/$(X) = (1234-Y)/1234

                     

                     

                    On the other hand, if X were defined without the "=" as is the case when we need to make the calculation on a dimensional level.

                     

                    SET X = 'Sum(Sales)';

                     

                    The expression: ($(X)-Y)/$(X) would be expanded as:

                    (Sum(Sales)-Y)/Sum(Sales)

                     

                    which is fundamentaly the same as writing the expression directly without using the variable. AKAIK, QV is not clever enough to reuse the first "Sum(Sales)" fragment as the result of the second "Sum(Sales)" fragment.  I may be wrong here, I'm not sure.

                     

                    -Rob

                    http://masterssummit.com

                    http://qlikviewcookbook.com

                      • Re: For the hardcore techies.. Qlik's memory referencing
                        Vishnu Chakrakishore

                        Rob,

                         

                        Thanks for the elaborate answer. It's very helpful.

                         

                        So as per your suggestion keeping a leading '=' will help improve the performance slightly? (During memory referencing)

                        So I could ask my team to simply put an '=' in every variable. How will this affect calculations? Would it yield any erroneous values? (Sorry for asking this dumb question)

                         

                        I've asked few more MVP's this question and they all said to me that the calculation times wouldn't change and they mentioned that Qlik will pull both X and Y from memory. I guess I'll share them this post so it'll be useful for them.

                         

                        I'll also be tagging Henric to see what his viewpoint is before I close this post.

                        I'm hoping he could respond on this

                         

                        Once again thanks for sharing your knowledge.

                         

                        hic

                          • Re: For the hardcore techies.. Qlik's memory referencing
                            Marcus Sommer

                            The equal-sign won't speed up an expression but it will by variables made a difference where respectively in which context they are calculated, see here: The Little Equals Sign. This meant you couldn't just add or remove it - it will be always depend on the concrete situation. And this is what Rob has had in mind - if a calculation could be made globally on the outside you would save performance if you just pulls this result instead of calculating it multiple times within the chart.

                             

                            In your case X and Y won't be cached (it's also my understanding how the caching worked that only whole expressions are cached and not expression-parts) unless you have X and Y calculated in further columns and refer on them per expression-label or per column().

                             

                            - Marcus

                              • Re: For the hardcore techies.. Qlik's memory referencing
                                Vishnu Chakrakishore

                                Marcus, thanks for sharing your valuable input -

                                 

                                I think equal sign before variable is improving calculation time because I have tested out with the following:

                                 

                                We have expressions with an if block like this, this is repeating multiple times in different expressions:

                                 

                                if(GetSelectedCount([Team Name])=0 or GetSelectedCount([Team Name])=2

                                ,

                                if((GetSelectedCount([Region Name])=0 and GetSelectedCount([Area Name])=0 and GetSelectedCount([Decile Level Region])=0 and GetSelectedCount([Decile Level Area])=0),

                                Expr1, Expr2)

                                 

                                 

                                In the load script, I have defined it as:

                                set vget0 =  '=GetSelectedCount([Team Name])=0 or GetSelectedCount([Team Name])=2';

                                 

                                set vget1 =  '=(GetSelectedCount([Region Name])=0 and GetSelectedCount([Area Name])=0 and GetSelectedCount([Decile Level Region])=0 and GetSelectedCount([Decile Level Area])=0)';

                                 

                                 

                                Now simply I have replaced it with If($(vget(0) ,$(vget1), Expr1, Expr2). The calculation seems to have improved a bit.

                                 

                                Also, I have put the variables in a text box and have seen that they're giving -1 ( Boolean True value) and 0 (Boolean False Value).


                                rwunderlich    

                                Rob, On a side note, I can't seem to find your book on Amazon . Is there a portal to purchase a paperback copy?


                                Also, does your book cover boolean operations to use as flags in set analysis? Does it cover any shortcomings on this procedure? I know HIC made a blogpost on using booleans as flags in Set analysis but that was 3 years and then we had QlikView Engine. On the newer QIX engine will it yield any significant improvement?


                                hic (I will be leaving the post open to see if Henric could comment on this. I'm really wishing he does )

                                  • Re: For the hardcore techies.. Qlik's memory referencing
                                    Rob Wunderlich

                                    You can calculate those if() statements using the "=" in the variable because it's logical that they are calculated at the global level, they are not dimension dependent. So this is a great solution, glad you are seeing a performance improvement.

                                     

                                    I didn't write a book. That was Stephen Redmond  stephen-x.redmond who wrote "QlikView for Developers Cookbook" which is available on Amazon.  I publish the QlikView Cookbook website, which is not actually a book.

                                     

                                    Redmond's "Mastering Qlikview"  also has some good performance tips.

                                     

                                    I don't know of any shortcomings with using boolean flags in Set Analysis.  Are you asking in comparison to another technique?

                                     

                                    -Rob

                                    http://masterssummit.com

                                    http://qlikviewcookbook.com

                                      • Re: For the hardcore techies.. Qlik's memory referencing
                                        Vishnu Chakrakishore

                                        Rob,

                                         

                                        Yeah that is what I'm currently doing. But there's only a slight improvement in performance.

                                         

                                        Sorry about thinking that you wrote a book Rob. (However, I feel that you should. Lot of developers will benefit from your knowledge.

                                         

                                        Yes for Boolean flags, how would the performance be by using 0 and 1 vs true(),false().

                                        Also if I put 0 and -1 as strings/numbers in the flag, will qlik interpret this as true/ false or as just strings.

                                         

                                         

                                        Also, in all of our charts, for dimensions, we're using variables. (Dynamic Dimensions to be precise) Here's one:

                                         

                                         

                                        set vcomparison        = if(GetSelectedCount(Comparison)>1,'prim_Sanofi_Novo_access_Reg',

                                                                        if(getfieldselections(Comparison)='Sanofi vs. Novo','prim_Sanofi_Novo_access_Reg',

                                                                            if(getfieldselections(Comparison)='Lantus vs. Bassaglar','prim_Lan_Bslg_access_Reg',

                                                                                if(getfieldselections(Comparison)='Toujeo vs.Tresiba','prim_Tjo_Trs_access_Reg','prim_Sanofi_Novo_access_Reg'))));

                                         

                                         

                                        My solution:

                                         

                                        Pick(Alt(Match(Comparison,'Sanofi vs. Novo','Lantus vs. Bassaglar','Toujeo vs. Tresiba'),4)

                                        ,[prim_Sanofi_Novo_access_Reg],[prim_Lan_Bslg_access_Reg],[prim_Tjo_Trs_access_Reg],[prim_Sanofi_Novo_access_Reg])

                                         

                                         

                                        I'm using 'Alt' to avoid Multiple selection  and also importantly default selection.


                                        Is this process comparable to an else clause in if? if this is so, I can replace our nested if's with this.

                                        Also, are there any shortcomings with this approach? Is this any performance efficient at all? If you could say so, I'll get this implemented across the board.

                                         

                                        Thanks in advance!

                                        marcus_sommer rwunderlich

                                          • Re: For the hardcore techies.. Qlik's memory referencing
                                            Marcus Sommer

                                            Strings will always need more resources than numerics. I haven't never checked differences within the performance between 0/1 or false()/true() - usually I use 0/1 directly and would assume that it would be more performant than the boolean-functions because the booleans are dual-values. Just put false() and num(false()) with a textbox and you will see what is meant. Using 0/1 has the further advantage of be usable as a multiplicator like: sum(value) * FLAG.

                                             

                                            To your second query: (nested) if-loops should be avoided when ever possible. Your approach with pick(match()) is easier to develop and to maintain and more performant as the if-loops.

                                             

                                            - Marcus

                                              • Re: For the hardcore techies.. Qlik's memory referencing
                                                Vishnu Chakrakishore

                                                Marcus,

                                                 

                                                Thanks for your reply on this. Maybe you're referring to this blogpost: On Boolean Fields and Functions

                                                by Henric. But contrary to your suggestion, Henric said Boolean functions are faster in flags. What you said about multiplicators are true though but they won't work for AVG etc. (This is also covered in the article).

                                                 

                                                For my second query:

                                                 

                                                Just to clarify, I'm using Alt function for default selection when no selection is made or multiple selections are made.

                                                Is this comparable to the else clause?

                                                 

                                                Just curious.

                                                 

                                                Once again, thanks for your input on this

                                                  • Re: For the hardcore techies.. Qlik's memory referencing
                                                    Marcus Sommer

                                                    Of course I know this posting but I haven't referred to it in my answer. I couldn't see where my suggestions are contrary to the suggestions from HIC. If the performance point of view by applying conditions to expression without flags and with different kinds of flags are really important for you - you should try it out by creating one ore several dummy datasets with at least 100 millions records - if you really do it, please made it systematically and post your results with all scripts and applications here in the community.

                                                     

                                                    The alt() functions isn't really to consider as else-part of an if-loop else as a default-value if all other parameters or branches are failing respectively not returning a numeric value.

                                                     

                                                    - Marcus

                                                      • Re: For the hardcore techies.. Qlik's memory referencing
                                                        Vishnu Chakrakishore

                                                        Marcus,

                                                         

                                                        Sorry for the delayed response.

                                                         

                                                        I did create a dataset. I tried with and without boolean flags but there was no way to accurately quantify performance. We're trying to investigate other ways of quantifying performance.

                                                         

                                                         

                                                        For 2nd question. - I have used Alt as an Else clause.

                                                        (Only useful in some scenarios but our entire app is using field functions - getfieldselections etc. )

                                                        My goal was to replace IF with Pick, Alt, Match - Here's proof:


                                                        Field_Name:

                                                        [ Channel Description]

                                                        Commercial
                                                        Health Exchange
                                                        Ltc
                                                        Managed Medicaid
                                                        Medicaid
                                                        Medicare

                                                         

                                                         

                                                        In a textbox if you use this expression:


                                                        alt(Match([Channel Description],'Commercial','Health Exchange','Ltc')+1,1)


                                                        The output for the above expression is {1,2,3,4} - 1 being default (No selections, Multiple Selections, Wrong Selection)


                                                        This output i.e. {1,2,3,4} can be used by Pick function.



                                                        Please give your thoughts on this..


                                                        marcus_sommer rwunderlich

                                                          • Re: For the hardcore techies.. Qlik's memory referencing
                                                            Marcus Sommer

                                                            As I mentioned before it's not very easy to measure the performance of different ways to do a thing in QlikView. But it could be done by using the mem-files like described in the above link and some smaller checks could be even done without this just by using the document properties within the tab sheets - the column time will show the calculation time from the first opening. Important by this is to disable the qlik caching or to close the application each time and opening them again.

                                                             

                                                            Therefore it will be easier in your case just to increase the amount of data until you see a real difference between the different calculations - and you should use a pivot for it with multiple dimensions to give qlik something to calculate.

                                                             

                                                            Your second point isn't really clear for me - maybe you could get your wanted results rather with a getselectedcount() or a count(distinct FIELD) and one or two if-loops. Of course if-loops should be avoided if possible especially if they are calculated multiple times within charts but by a single calculations within a textbox you could be a bit pragmatic and just use the if-loop for simplicity instead of spending much efforts to implement a different solution.

                                                             

                                                            - Marcus

                                                            • Re: For the hardcore techies.. Qlik's memory referencing
                                                              Rob Wunderlich

                                                              If you can't quantify performance, than I would say it's "good".

                                                               

                                                              -Rob