20 Replies Latest reply: Oct 23, 2014 7:03 AM by Frederico Mensurado RSS

    help - performance improvements

    Clive Spindley

      if i have a line of code like this:

       

      if(a=b,1,

      if(c=f,2,

      if(c=g,3,4)))

       

      my ?: if first condition is true (i.e.a=b) does the engine (parser) still waste it's time and slow stuff down by checking the remaining condiions?

        • Re: help - performance improvements
          Rob Wunderlich

          What you are describing is commonly called "short circuiting".  I believe the answer is "yes". When the condition is true, the remaining Else conditions are not evaluated. I'm not sure of this though. Perhaps someone from QT development can weigh in on this question. Maybe hic?

           

          -Rob

            • Re: help - performance improvements
              Henric Cronström

              The else part of the expression is indeed evaluated, also when not needed. You can test this yourself by using an input box with a variable, and a pivot table with a (heavy) expression that uses the variable, e.g.

                   If( vTestVariable = 1, 'simple...', Count( distinct FieldWithManyRecords ) )

               

              Note how long time it takes to calculate this when vTestVariable=0, then change the value of the variable to 1. It takes the same amount to calculate.

               

              So, there is still room for optimization...

               

              HIC

                • Re: Re: help - performance improvements
                  Rob Wunderlich

                  This is very interesting. Would a pick(match()) evaluate any less? For example:

                   

                  pick(match('1', '1','2'),'simple',Count( distinct FieldWithManyRecords ) )

                   

                  In this expression, would the count(...) be evaluated as well?

                   

                  -Rob

                    • Re: Re: help - performance improvements
                      Clive Spindley

                      thanks Rob, will sleep on it

                      • Re: Re: help - performance improvements
                        Henric Cronström

                        Yes, the Count(...) is evaluated as well, also in your expression. Just tested it...

                         

                        HIC

                          • Re: help - performance improvements
                            Rob Wunderlich

                            Thanks for your input Henric. As always, I'm grateful for your insight and transparency.

                             

                            -Rob

                              • Re: help - performance improvements
                                Magnus ÅVITSLAND

                                Hi Rob and Henric.

                                 

                                I did not believe you at first as my colleague showed me your post.

                                My first question was "Could QT really have broken the standard of conditional evaluation?"

                                Yes they can!

                                 

                                I suspect it has to do with UI responsiveness, to cache all expressions for a better user experience.

                                But alas, I hate it when application vendors try and help bad programmers and exchange common standardised praxis functionality for ease of use........

                                 

                                It is not the behaviour a programmer would expect.

                                Normally, conditions exit after a match, but not in this case.

                                It seems as it evaluates expressions for all conditions no matter what.

                                 

                                I tried to replicate the same behaviour with conditional expressions, and gladly it works as one would expect;

                                The expression is not evaluated if the pre-condition fails.

                                Do not confuse with conditions in expressions.

                                 

                                Condition in expression:

                                if (vVar = 0, sum(iCounter),
                                if (vVar = 1, count(distinct SSN),
                                count(distinct SSN) + count(distinct %KEY_SSN_YearMonth)
                                )
                                )

                                 

                                //Regards

                                Magnus Åvitsland

                                Framsteg.com

                                Stockholm, Sweden

                                  • Re: help - performance improvements
                                    Henric Cronström

                                    I agree that at first glance one would think that the optimal behaviour must be not to evaluate remaining conditions. But the question is a lot more complicated than that...

                                     

                                    The algorithm to calculate an expression is extremely complex. Say, for instance that you have a chart with multiple dimensions: Then the expression should be evaluated for each combination of the dimensional field values, i.e. the Cartesian product of the constituent fields. And this in an arbitrary data model.

                                     

                                    Further, the argument of the aggregation function could involve fields from different tables, e.g. Sum(A*B) where A and B sit in two tables far from each other. The aggregation then needs to take place in an virtual ntuple created from the Cartesian product of A and B, where the argument of the aggregation function is to be evaluated once per row. But the expression is not parsed for every row - instead (for performance reasons) the expression is converted to assembler code and executed for each row.

                                     

                                    So, in the general case, a chart involves a double Cartesian product using an arbitrary number of fields in both levels. It is like having a SELECT statement with an arbitrary number of fields in the argument of the aggregation function and an arbitrary number of GROUP BY fields, but without having direct information about the JOINs...

                                     

                                    And then we need to add the possibility of any number of scalar functions at any level of the expression; e.g. any number of nested if()-functions. Needless to say, the algorithm is quite complex, and when it was implemented, we just didn't manage to short circuit the evaluation. And I am still today not sure that it would be possible to combine short-circuiting with the assembler code.

                                     

                                    HIC

                                      • Re: help - performance improvements
                                        Clive Spindley

                                        The poor'ish performance when when rendering the timeline for a specific chip may be due to the lack of "short circuiting" (apologies if any confusion, but I speak a language which is a mix of Qlik technology and Big Health data, specifically NHS e.g. pathways, journeys, indicators, targets, periods of care, care activities etc.). Consider the list of periods for a Lung cancer pathway:

                                        sc1.PNG.png

                                        This picture renders immediately.

                                        The pathway/time line for this is:

                                        sc2.PNG.png

                                        This is taking just over 1s to render (not good enough), and I think it's because it's running thru' code for ALL chips (consumer health integrated pathways) not just the Lung cancer, I will try and find the time to test out some of the suggestion made but I don't have the resource (I only need a bit), which when you consider what the NHS budget is is utter madness ...

                            • Re: help - performance improvements
                              Henric Cronström

                              @ Rob, clivespindley, Magnus Åvitsland, rbecher

                               

                              I have discussed the "short circuiting" further with Håkan (the Inventor) now, and the reason why QlikView does't short circuit, is quite simple. Say that you have an expression

                                   If( <Condition>, <Expression1>, <Expression2> )

                              that is to be evaluated once per dimensional value.

                               

                              If QlikView had evaluated the chart like a For-Next loop (one loop per dimensional value), then the Condition would first be evaluated and QlikView could choose which Expression then to evaluate. In other words, "short circuiting" would have been the obvious strategy.

                               

                              But this would mean many aggregations - many passes over a large data set - and it would be very inefficient. Hence, it is not how QlikView evaluates a chart.

                               

                              Instead, QlikView makes one pass over the data, calculating all three aggregations. These are then binned into the different dimensional values, whereupon the Condition is checked and the right Expression is used. Hence, the Condition is evaluated after the aggregation is made, and short circuiting is not possible.

                               

                              The only improvement that could be made, is to check whether the Condition is an aggregation or just a check of a variable value, and use short circuiting for the latter case.

                               

                              HIC

                                • Re: help - performance improvements
                                  Clive Spindley

                                  That makes sense, one of the advantages of BI (in addition to being able to handle very large volumes of data)

                                   

                                  is the power to aggregate (and eliminate this boring activity from the “to worry about list”).

                                   

                                  It does of course mean working with the data at the appropriate level, the bottom level or the most complex

                                   

                                  depending on your point of view and what excites you.

                                   

                                  Thanks for the clarification,

                                   

                                  Clive

                                  • Re: help - performance improvements

                                    Hi Henric -

                                     

                                    I've noticed you've put up a couple of blog posts on this topic, I'm glad you've shared it as I've found it an extremely useful insightand I think other users would too. Can I just ask, does the same apply to script if() in load statements  as for chart expression if statements?

                                     

                                    And when it comes to the script control if..then...else..end does it do something similar? I've noticed if I put an "exit script" in an if statement, the loader sometimes picks up later items in the script.

                                     

                                    Erica

                                      • Re: help - performance improvements
                                        Henric Cronström

                                        If you use an If() function in the script, you have the same problem - in principle. An If() will take some time to evaluate, and if used in a large table it will prolong the script execution. But it is not as bad, since a longer script execution often is acceptable. Further, you don't have any options for the script - you can't use Set Analysis.

                                         

                                        The control statement if..then..else is different though: It is executed once, and that's it. As opposed to the If() function that may be executed millions of times - once per record.

                                         

                                        HIC

                                  • Re: help - performance improvements

                                    Hi Clive, I had a similar issue, but in that my if statement was huge, and I wanted to simplify it (rather than optimise the load)

                                     

                                    If you use a combination of a lookup table and the alt() function, which takes the first non-null value of a list of values, you can get a similar result.

                                     

                                    Lookup:

                                    Load * Inline [

                                    acaRescRes
                                    bf12
                                    bg13

                                    ];

                                     

                                    left join (Main_Data) Load * resident Lookup;

                                    drop table Lookup;

                                     

                                    left join (Main_Data)

                                    Load

                                    SingleKeyField,

                                    alt(aRes,cRes,4) as resultField

                                    resident Main_Data

                                    ;

                                     

                                     

                                    Here I've assumed that if there are no alternatives to a and c except null. This would yield the same result as your if statement. I have no idea on the impact on performance though.

                                     

                                    Erica

                                    • Re: help - performance improvements
                                      Clive Spindley

                                      selsingchip.PNG.png

                                      Like BMW IAM aggressive in my chase for better performance and  have hi ambitions in this regard. My (and all) unique ehSIGNature now renders in 0 secs, so ..., on to the next thing (in IT there's no rest for the wicked !)  I am determined  that every "app screen" will render in 0 secs. If the user (health consumer) selects Lung cancer then the time line uses icons (circles and pyramids) to give an indication of breach (e.g. target wait times exceeded) and how the team performed:

                                      lungcancer.PNG.png

                                      This pictures (and the one below) is taking >1 sec to render.

                                       

                                      lungcancer2.PNG.png

                                        • Re: help - performance improvements

                                          Hi Clive, I understand the complexities, I also work in the NHS! Do you work for a trust?

                                           

                                          What expressions do you have in the chart?

                                          Erica

                                            • Re: help - performance improvements
                                              Clive Spindley

                                              hi Erica,

                                              I have worked for Barts and WLMHT in the past, but am currently unwaged and looking for paid work.

                                              Which trust are you working for?

                                              see below trail ...

                                              I have dipped into the community to try and get some help, have discovered that with regard to expression code, the general View seems to be that Qlik does not “short circuit”,

                                              (I have had some helpful feedback from some very bright people, however in depth understanding of the code is not my “specialty”, my specialty is big data.)

                                               

                                              1. e.g. of no short circuit :

                                               

                                              if(a and b and c,d)

                                               

                                              If a is false it still checks b and c …

                                               

                                              This is unfortunate.

                                               

                                              The requirement:

                                               

                                              • (see pict below This is myWorld – turning data into pictures J)

                                               

                                              Look at the hi lited row at the top, what is happening:

                                               

                                              An ambulance journey has come to an end, that ends (QUAntum particle u) the ambulance journey period (which can then be measure and I think has a target of 8 mins),  but it also starts (QUAntum particle 2d)

                                              a wait period in A&E (target 4hours).

                                               

                                              I am searching for the end of an ambulance journey and use the following code:

                                               

                                              if a=1 and b=2 and substringcount(signActQUArkPeriodExpandedType,ehSIGN_Part02StartQk)>0

                                               

                                              where the first substringcount parameter = 2d/CON20130622/A&E#u/CON20130622/AJ

                                              and the second substringcount parameter = u/CON20130622/AJ

                                               

                                              This is the only complex bit of code left now (it is repeated multiple times, often a<>1 or b<>1 but the code still gets executed, see above)

                                               

                                              Can you suggest how to make it more efficient and help to get the render time<1s ?

                                               

                                              Iam keeping the pictures (colours and shapes) simple, what’s behind them is complicated – you can’t get away from this, health is the most complex domain of all !

                                               

                                              Putt the health consumer first, don’t cause unnecessary confusion, often people don’t want to know the detail, keep the pictures (the apps) simple!! (even if what’s behind them is complex).

                                               

                                              see the following picture:

                                               

                                              em1.png

                                              in 80% of the cases the first parameter only contains 1 part so in this I could do an = which would be much quicker than using the substringcount function ...

                                               

                                              Are you working with a data warehouse or a more focused BI data mart?

                                          • Re: help - performance improvements
                                            Frederico Mensurado

                                            Hi there,

                                             

                                            I've had that problem as well. Use strings and evaluate the result works fine as a workaround when using CPU costly expressions. Ex:

                                            If(a=1,

                                                 <expression 1>,

                                            If(a=2,

                                                 <expression 2>,

                                            If(a=3,

                                                 <expression 3>,

                                            //else

                                                 <expression 4>

                                            )))

                                             

                                            you may convert every single expression to its string equivalent and $ at the end:

                                             

                                            $(=

                                            If(a=1,

                                                 <stringified expression 1>,

                                            If(a=2,

                                                 <stringified expression 2>,

                                            If(a=3,

                                                 <stringified expression 3>,

                                            //else

                                                 <stringified expression 4>

                                            )))

                                            )

                                             

                                            What you win: Only evaluate four strings (instead of four costly expressions) and only one costly expression instead of four.