Calculating a set aggregation for each row of data - Qlik Community

Report Inappropriate Content · ‎2010-07-08

Hi all,

I've just downloaded the trial version of Qlikview in order to see if it fits our needs. What I need to do is to calculate aggregative value, for each row of data. Here's an example that could clearify things:

I have sales data for every product in every day. I want to calculate the standard score (the difference from the mean, in standard deviations) of each day's sales. This can be used to detect days in which sales were exceptionally high or low (where the standard score > 3, for example) for each of the products. So the table I need has the following fields: product, day, sales, standard score.

Oracle supports this kind of calculation with the sql expression: (sales - avg (sales) over (parition by product)) / stdev(sales) over (parition by product)

Is it possible to do that in QlikView, and if so, what's the simplest way?

Thanks a lot,

Amos.

Report Inappropriate Content · ‎2010-07-08

This certainly can be done in QlikView. It requires the use of set analysis to calculate the mean (denominator) independent of selections:

Dimensions:

Product, Sales Date

Expression:

Sum(Sales) / Sum( {1} Sales)

By using the {1} you are calculating the sum independent of the product selected.

/emb

Report Inappropriate Content · ‎2010-07-08

Thanks for the quick response, bglbi.

I tried that in a straight table chart, and what I got is a straight 1 for every date and product. It didn't matter whether I selected something or not.

This expression does not include any reference to the fact that I want the sum to be over all of the days for each product, and not over the products for each day (althoug that makes sense too). From what I understand of the documentation, sum({1} sales) will be the same for every day and product in the dataset, while I need the sum of sales over all the days, of each product, on each row.

I hope the next table makes what I need clear:

Day Product TotalProductSales Day'sSales Day'sRelativeSales

1/4/2010 ProductA 1000 300 30%

2/4/2010 ProductA 1000 100 10%

3/4/2010 ProductA 1000 600 60%

1/4/2010 ProductB 200 90 45%

2/4/2010 ProductB 200 90 45%

3/4/2010 ProductB 200 10 5%

4/4/2010 ProductB 200 10 5%

Any ideas?

Amos.

johnw · ‎2010-07-08

I believe you want this?

TotalProductSales = sum(total <Product> Sales)
Day'sSales = sum(Sales)
Day'sRelativeSales = "Day'sSales"/"TotalProductSales"

Report Inappropriate Content · ‎2010-07-11

Thank you very much, John. That seemed to be just what I needed.

Now, just to understand more, I have a few questions:

From what I understand, the "total" keyword indicates that the sum is taken over a different context than the current selection. But what's the <product> indicating? Does it mean that the context is all records with the same product as the current record? Did I get that right?
I read something about the aggr function. Is sum(total <Product> Sales) the same as aggr(sum(sales), product)? If it is, than what's the difference between the two?
When I played with the data, I noticed that if I set Day'sSales to be = Sales, there wasn't any data displayed for this column. Why is that? If I only have one record for each day and product, why sould I sum it?
Can I display only rows that satisfy a given critrion? For example, I want to display only the exceptional days for each product (more than 20% of the total sales).

And once again, thanks for all the help. I think I'm gradually getting the intuition behind qlikview .

Amos.

johnw · ‎2010-07-12

1. The "total" keyword indicates that the sum is to be taken across the entire table, not just a row on the table. So it changes the context from the row to the entire table. The <product> part says that no, we don't actually want the ENTIRE table, just all of the rows on the table with the same product as the current row.

2. You know, I hadn't really thought about it before, but they ARE very similar. Aggr() does seem to take the whole table as the context, so it's kind of like using the "total" keyword by default. And specifying aggr(...,product) will also make it so that you're only looking at the matching product. It's a little funny, though, in that the aggr() will only assign the resulting sum to ONE of your rows. Still, you can use the "nodistinct" keyword to make it repeat the values. Then the only difference would seem to be in totals, where the aggr() ends up returning multiple values, and so evaluates to null.

3. I'm not sure what you're saying.

4. You could probably handle it with a calculated dimension, maybe like this:

aggr(if(sum(Sales)/sum(total <Product> Sales)>0.2,Product),Product)

Report Inappropriate Content · ‎2010-07-12

John,

Wouldn't you expect some performance issues from using a calculated dimension like the one you supplied?

johnw · ‎2010-07-12

Yes, you could easily have performance issues with a calculated dimension like that, at least if you have much data. The trouble is - what's our alternative? The only dynamic alternative I'm thinking of is applying a very similar condition to EVERY expression. It might perform better given how bad calculated dimensions can be, but it'd be a lot more trouble, and I wouldn't expect it to perform well either, just better.

Now, if we don't need our definition of "top 20%" to be sensitive to selections, if it could be static, then we could do something in the script, and then our performance would be just fine. So I suppose it depends on our definition of the top 20%. Do we ALWAYS mean for our ENTIRE data set? Or should it be sensitive to selections?

Or maybe someone knows a dynamic solution better than a calculated dimension or expression?

Report Inappropriate Content · ‎2010-07-14

Hi John,

3. My mistake, sorry 🙂

4. I'm sorry, but I didn't quite get that. Aren't the dimensions calculated before the expressions? If so, how could I create a calculated dimension based on an expression (this record's relative sales)?

Thanks,

Amos.

johnw · ‎2010-07-14

The aggr() basically makes an internal chart by Product, from which it extracts the "over 20%" products as your dimensions for the real chart, all before calculating the expressions in the real chart. Assuming I don't have a syntax error or some other problem.