11 Replies Latest reply: Nov 11, 2010 5:04 AM by Hamish Donald RSS

    How can I calculate weighted / corrected averages with set analysis / aggr function?

    michel.barca

      Hi, building on the below post I have tried to create a simplified model in Qlikview to calculate weighted / corrected averages:

      http://community.qlik.com/forums/t/35791.aspx

      PROBLEM:

      How can I achieve graph/table CORRECTED AVG FOR SELECTION (APO=A) in Qlikview as calculated in Excel?

      CONTEXT:

      The goal I'm after is outlined in the attached Excel. Essentially, instead of calculating "naive" averages such as in table/graph "NORMAL AVG", I want to correct this "naive" average by correcting the average for the customer population of the specific store we are evaluating as calculated in "CORRECTED AVG FOR SELECTION (APO=A)".

      The input table contains the fields:

      - apo = store

      - rel_vrd = customer

      - gebj_bucket = age group of customer

      - ddd_bucket = turnover group of customer

      - atc5 = product code (not relevant for now)

      The count that I'm doing is to calculate the number of customer contact interactions that are taking place in the different categories.

      How can I achieve graph/table CORRECTED AVG FOR SELECTION (APO=A) in Qlikview as calculated in the Excel?

      The primary problem that I have identified is that the vector VECTOR PER GEBJ_BUCKET still contains the dimension bbb_bucket (i.e. it is still functioning as a matrix instead of a vector).

      Any help is greatly appreciated!

      I couldn't attach the files due to a SERVER error, so I put them on box.net:

      http://www.box.net/shared/5igr68p1l0

      http://www.box.net/shared/55mlyrnhs9

        • How can I calculate weighted / corrected averages with set analysis / aggr function?
          Hamish Donald

          Hi,

          I expect you've found an answer already, but if not you could try the following;

          Load in the vector table VECTOR PER GEBJ_BUCKET in the script e.g.

           

          WEIGHT:

          LOAD * INLINE [

          gebj_bucket, vector

          j, 3

          m, 1

          o, 0

          ];



          Then for the %Reg expression in your CORRECTED AVG FOR SELECTION (APO = A) table use

          =sum( aggr(count(ddd_bucket)*sum(vector),ddd_bucket,gebj_bucket))/sum(TOTAL aggr(count(ddd_bucket)*sum(vector),ddd_bucket,gebj_bucket) )

          That seemed to get the desired 29%, 50%, 14%, 7% result.

          I wasn't able to download the qvw, only the excel, so I can't be sure you didn't already try something similar,

          Regards,

          HD

            • How can I calculate weighted / corrected averages with set analysis / aggr function?
              michel.barca

              HD,

               

              Thanks for your response! Unfortunately, the vector which you propose to load in the load script is a dynamic vector, which changes with the selection. So I have to find a way to dynamically calculate it. Any ideas would be great!

              As to the qvw: does http://www.box.net/shared/5igr68p1l0 not work for you? I can send it by mail to you if it's useful. You can reach me at michel.postvak{at}gmail{dot}com.

               

              Cheers,

              Michel

               

                • How can I calculate weighted / corrected averages with set analysis / aggr function?
                  Hamish Donald

                  Michel,

                  qvw access through other routes is a problem - can you list out some sample rules for deriving the 'Vector' field dynamically ?

                  Regards,

                  HD

                    • How can I calculate weighted / corrected averages with set analysis / aggr function?
                      michel.barca

                      HD,

                      Sure, I realize now I hardcoded them in the Excel. Sorry for the confusion.

                      The rule for the calculation of the Vector should be:

                      WORDS: apo = A -> Count per gebj_bucket of how many rel_vrd fall in that gebj_bucket for apo = A

                      ALGORITHM:apo = A -> j=SUM(IF(apo = A, IF(ddd_bucket = "j",1,0), 0) , m=SUM(IF(apo = A, IF(ddd_bucket = "m",1,0), 0) , o=SUM(IF(apo = A, IF(ddd_bucket = "o",1,0), 0)

                       

                      EXAMPLE outcomes:

                      apo = A -> j=3, m=1 , o=0 (for all ddd_bucket)

                      apo = B -> j=0, m=3 , o=0 (for all ddd_bucket)

                      apo = C -> j=0, m=1 , o=2 (for all ddd_bucket)

                       

                      Is that clear? If it isn't I can send you a new excel with the formula.

                       

                      Thanks for your help!

                      Michel

                       

                        • How can I calculate weighted / corrected averages with set analysis / aggr function?
                          Hamish Donald

                          Michel,

                          Try replacing the 'sum(Vector)' bit of the expression above with the following;

                           

                          count({$<apo={"A"}>} TOTAL <gebj_bucket> rel_vrd)

                          That will always pick up apo ' A'.

                          For changing selections you'd have to (one possibilty ) load the apo field out into a 'Master_apo' field and set as follows

                          count({$<apo={'$(=Master_apo)'}>} TOTAL <gebj_bucket> rel_vrd)

                          If a single selection is made that expression should work,

                          Regards,

                          HD



                            • How can I calculate weighted / corrected averages with set analysis / aggr function?
                              michel.barca

                              HD,

                              Thanks for your tip. Your expression does something similar to what I had with count( {<ddd_bucket=>} rel_vrd). When disregarding ddd_bucket, this gives the correct result for the vector. However, when adding ddd_bucket as a dimension you will see that your (or my) expression does not give the desired matrix with 3,1,0 on every row, but instead:

                              j m o

                              i) 3 0 0

                              ii) 3 0 0

                              iii) 0 1 0

                              iiii) 0 0 0

                              Any idea on how to fix this?

                              Cheers,

                              Michel

                                • How can I calculate weighted / corrected averages with set analysis / aggr function?
                                  Hamish Donald

                                  Hi Michel,

                                  Sorry, I can't see a fix for this as I'm not sure I get the requirement - it looks like the vector calculation is working just as you specified since for Apo A some 'ddd_bucket's are not used - e.g. bucket iii only appears for 'm' and iiii not at all.

                                  By adding the ddd_bucket as a dimension it will drill down to show which bucket made up the total for j, m or o.

                                  You could try changing the show all values/populate missing cells on the pivot table or even adding a disconnected field for ddd_buckets but if the rows are always all the same I'm not sure what value that is ?.....why not just leave off the dimension and title the table 'Vector for all ddd_buckets' ?,

                                  Regards,

                                  Hamish

                                   

                                    • How can I calculate weighted / corrected averages with set analysis / aggr function?
                                      michel.barca

                                      Hamish,

                                      Sorry it took me a while to answer, but I have just programmed what I wanted in SQL and it works great. Now if I can only get the performance of qlikview that would be amazing.

                                      As to your suggestion, I tried the option you mention (show all values - not sure what you mean by populate missing). The reason why the rows need to be the same is because otherwise I cannot calculate the product of that matrix (actually being a vector repeated on each row) with another matrix. Qlikview always remembers what the original buckets are (and I want qlikview to forget, otherwise I keep loosing values in my matrix ).

                                      The core problem is I want to multiply the sum of a set (i.e. the matrix with all rows being equal) with parts of that same set (the matrix with different values). Even if I use set operators, I cannot get Qlikview to forget the original dimensions. There must be a way to strip out the dimension (or to fill the same result for each value of the dimension). I can't seem to find it though.

                                      Any ideas?

                                       

                                       

                                       

                        • How can I calculate weighted / corrected averages with set analysis / aggr function?
                          Hamish Donald

                          Hi,

                          I'm out of ideas on the set analysis / aggregation side but I think the problem for Qlikview may be that there is not a complete matrix to work with.

                          When you add 'ddd_bucket' and 'gebj_bucket' to a pivot chart there are some combinations not present in the 'in memory' table you've created from your input file.

                          Perhaps if you were to add in rows for the missing combinations - with no value in rel_vrd - you'd get the result requested ,

                          e.g. if you've already loaded your input file called, say, 'INPUT'

                          then repeat a number of join loads to combine each element used with every other element to create a complete matrix.

                           

                          INPUT_PLUS:

                          LOAD DISTINCT ddd_bucket as Xddd_bucket

                          RESIDENT INPUT;

                           

                          OUTER JOIN

                          LOAD DISTINCT gebj_bucket as Xgebj_bucket

                          RESIDENT INPUT;

                           

                          OUTER JOIN

                          LOAD DISTINCT apo as Xapo

                          RESIDENT INPUT;

                           

                          OUTER JOIN

                          LOAD atc5 as Xatc5

                          RESIDENT INPUT;

                           

                          Then use the following to fill in the rel_vrd values -

                           

                          LEFT JOIN

                          LOAD apo AS Xapo,

                          ddd_bucket AS Xddd_bucket,

                          gebj_bucket AS Xgebj_bucket,

                          rel_vrd AS Xrel_vrd,

                          atc5 AS Xatc5

                          RESIDENT INPUT;

                           

                          Now try working with INPUT_PLUS as the in memory table for your pivot, using the code from the previous posts to ignore ddd_bucket and 'TOTAL' by gebj_bucket, you should at least now be working with a complete matrix of dimension combinations (only some of which have rel_vrd values)

                          Regards,

                          HD