10 Replies Latest reply: Oct 17, 2012 11:01 AM by John Witherspoon RSS

    Is there a faster way to calculate a dimension?

      Hi,

       

      We use Qlikview for a dataset of 1.5 billion rows.  Due to its size, normal methods of calcuating fields can take too long and I have to look for another method.

       

      I am really struggling with the speed of one calculation that uses the aggr function.  We want to roll up item level data to calculate the number of people who have bought 1,2,3 etc of a particular item in one transaction (a basket).  The outcome looks as follows...

       

      Target Units          Basket Count

           1                         8,000,000

           2                         6,000,000

           3                         3,000,000

           4                         1,000,000

       

      The calculated dimension and expression formula I have used for this is below.  Does anyone know another way we could get this result using a different method that might run faster?  The approach of using a dimension with the aggr function is taking 5 minutes to calculate!

       

      Dimension =aggr(distinct sum(SALES_UNITS), BasketNumber)
      Expression = count(distinct BasketNumber)
      

       

      Many thanks for any help.

        • Re: Is there a faster way to calculate a dimension?
          Jason Michaelides

          Hi Tom,

           

          Firstly, using COUNT(DISTINCT xxxx) across 1.5bn rows is going to hit your performance significantly I would think (depending on the number of distinct BasketNumbers).  Try this first:

           

          In your script, add a field to your Basket table:

           

          ,1     AS     %Counter_Basket

           

          I am assuming the Basket table contains only DISTINCT BasketNumbers. If not, extract to a different table:

           

          BasketNumbers:

          LOAD DISTINCT

               BasketNumber

               ,1     AS     %Counter_Basket

          RESIDENT TableWithBasketNumberInIt;

           

          (I have prefixed the field with % to hide it - you will need to add SET HidePrefix = '%'; to the top of your script)

           

          Then, as your expression, use SUM(%Counter_Basket) instead of COUNT(DISTINCT BasketNumber).

           

          Try that first and it should improve. If you still need more improvement then we can attack the Aggr() too.

           

          Hope this helps,

           

          Jason

            • Re: Is there a faster way to calculate a dimension?

              Thanks Jason,

               

              I have implemented your suggestions to remove the count distincts and it has improved the timings, but it still takes just over 4 minutes to run.  Any further thoughts you may have on the aggr function would be appreciated.

               

              John

                • Re: Is there a faster way to calculate a dimension?
                  Jason Michaelides

                  Can you post a screenshot of your data model?

                    • Re: Is there a faster way to calculate a dimension?

                      table_layout.png

                      Hi Jason, Here is the layout of the tables.  The BasketNumber in my earlier formulas equates to BTC_TILL_TXN_KEY in they tables.

                       

                      Rob - Thank you for your input, sadly the selection criteria can be any possible combination of SKUs going right down to the lowest level so there is no way to roll the table up to a summarised level first.

                        • Re: Is there a faster way to calculate a dimension?
                          Rob Wunderlich

                          Can you check -- using windows task manager -- that multiple CPUs are being used in the calculation?

                           

                          -Rob

                          • Re: Is there a faster way to calculate a dimension?
                            Jason Michaelides

                            Hi Tom,

                             

                            Well I'm glad we have a achieved a 20% performance improvement already!  You might want to search this forum for "Best Practices" to get more tips on things like this.

                             

                            Regarding the Aggr() - try removing the DISTINCT.  I'm not sure you need it.  Other than that I can't really think of another way.  The reality is that unless you have some really serious hardware, calculating over 1.5bn records is going to be slow.  Have you considered breaking up the qvw based on a dimension (gov_office_region, for example)? Or restricting the date range per application?  You can use document chaining to move to archived data if necessary - is it really important to have all data available in the same application?  I think you may need to start thinking along these lines.

                             

                            On your data model, you might want to consider bringing the timedata table into the same table as Basket, and branddata into MainFact to reduce the joins.

                             

                            Hope this helps,

                             

                            Jason

                              • Re: Is there a faster way to calculate a dimension?
                                John Witherspoon

                                Jason Michaelides wrote:

                                 

                                ...Have you considered breaking up the qvw based on a dimension (gov_office_region, for example)? Or restricting the date range per application?  You can use document chaining to move to archived data if necessary - is it really important to have all data available in the same application?  I think you may need to start thinking along these lines...

                                As a simple example of the above, our largest application has five years of manufacturing data and can be quite slow.  However, perhaps 95% of the time, the users are only interested in the current year.  So we use Publisher to build a YTD version with only the current year in it, and THAT version seems to have acceptable performance.

                                 

                                In a related case, I was building a chart summarizing all the data in that application, and despite trying multiple approaches and spending days on tuning, it still took over a minute to render.  I ended up building a separate application with a data model designed explicitly to support that one chart, and it renders just fine.  It's not ideal to split things up like this, but sometimes it may be the only practical option.

                      • Re: Is there a faster way to calculate a dimension?
                        Rob Wunderlich

                        If possible, some pre aggregation in the script would be helpful. If the reporting dimension always a single item#? And can there be additional filters applied to ta basket/item combo?

                         

                        What I'm getting at is doing a script calc of the unit count by BasketNumber, Item#. Then you could use the UnitCount as a dimension field and avoid the aggr/ Possible?

                         

                        -Rob