10 Replies Latest reply: Jul 20, 2010 7:46 AM by amosmoss RSS

    Calculating a set aggregation for each row of data

    amosmoss

      Hi all,

      I've just downloaded the trial version of Qlikview in order to see if it fits our needs. What I need to do is to calculate aggregative value, for each row of data. Here's an example that could clearify things:

      I have sales data for every product in every day. I want to calculate the standard score (the difference from the mean, in standard deviations) of each day's sales. This can be used to detect days in which sales were exceptionally high or low (where the standard score > 3, for example) for each of the products. So the table I need has the following fields: product, day, sales, standard score.

      Oracle supports this kind of calculation with the sql expression: (sales - avg (sales) over (parition by product)) / stdev(sales) over (parition by product)

      Is it possible to do that in QlikView, and if so, what's the simplest way?

      Thanks a lot,

      Amos.

        • Calculating a set aggregation for each row of data
          bglbi

          This certainly can be done in QlikView. It requires the use of set analysis to calculate the mean (denominator) independent of selections:

          Dimensions:

          Product, Sales Date

          Expression:

          Sum(Sales) / Sum( {1} Sales)

          By using the {1} you are calculating the sum independent of the product selected.

           

          /emb

            • Calculating a set aggregation for each row of data
              amosmoss

              Thanks for the quick response, bglbi.

              I tried that in a straight table chart, and what I got is a straight 1 for every date and product. It didn't matter whether I selected something or not.

              This expression does not include any reference to the fact that I want the sum to be over all of the days for each product, and not over the products for each day (althoug that makes sense too). From what I understand of the documentation, sum({1} sales) will be the same for every day and product in the dataset, while I need the sum of sales over all the days, of each product, on each row.

              I hope the next table makes what I need clear:

              Day Product TotalProductSales Day'sSales Day'sRelativeSales

              1/4/2010 ProductA 1000 300 30%

              2/4/2010 ProductA 1000 100 10%

              3/4/2010 ProductA 1000 600 60%

              1/4/2010 ProductB 200 90 45%

              2/4/2010 ProductB 200 90 45%

              3/4/2010 ProductB 200 10 5%

              4/4/2010 ProductB 200 10 5%

              Any ideas?

              Amos.

                • Calculating a set aggregation for each row of data
                  John Witherspoon

                  I believe you want this?

                  TotalProductSales = sum(total <Product> Sales)
                  Day'sSales = sum(Sales)
                  Day'sRelativeSales = "Day'sSales"/"TotalProductSales"

                    • Calculating a set aggregation for each row of data
                      amosmoss

                      Thank you very much, John. That seemed to be just what I needed.

                      Now, just to understand more, I have a few questions:

                      1. From what I understand, the "total" keyword indicates that the sum is taken over a different context than the current selection. But what's the <product> indicating? Does it mean that the context is all records with the same product as the current record? Did I get that right?
                      2. I read something about the aggr function. Is sum(total <Product> Sales) the same as aggr(sum(sales), product)? If it is, than what's the difference between the two?
                      3. When I played with the data, I noticed that if I set Day'sSales to be = Sales, there wasn't any data displayed for this column. Why is that? If I only have one record for each day and product, why sould I sum it?
                      4. Can I display only rows that satisfy a given critrion? For example, I want to display only the exceptional days for each product (more than 20% of the total sales).

                      And once again, thanks for all the help. I think I'm gradually getting the intuition behind qlikview Smile.

                      Amos.

                        • Calculating a set aggregation for each row of data
                          John Witherspoon

                          1. The "total" keyword indicates that the sum is to be taken across the entire table, not just a row on the table. So it changes the context from the row to the entire table. The <product> part says that no, we don't actually want the ENTIRE table, just all of the rows on the table with the same product as the current row.

                          2. You know, I hadn't really thought about it before, but they ARE very similar. Aggr() does seem to take the whole table as the context, so it's kind of like using the "total" keyword by default. And specifying aggr(...,product) will also make it so that you're only looking at the matching product. It's a little funny, though, in that the aggr() will only assign the resulting sum to ONE of your rows. Still, you can use the "nodistinct" keyword to make it repeat the values. Then the only difference would seem to be in totals, where the aggr() ends up returning multiple values, and so evaluates to null.

                          3. I'm not sure what you're saying.

                          4. You could probably handle it with a calculated dimension, maybe like this:

                          aggr(if(sum(Sales)/sum(total <Product> Sales)>0.2,Product),Product)