Set Analysis with Aggr() function

johnhorneramllp · ‎2009-10-27

Chaps, courtesy of Johannes last night i have the following function that enables me to show profitable customers within my data model. Unfortunately however i have quite a large dataset and the expression is proving too slow. I feel that i should be able to use Set Analysis but still struggle with the syntax so am currently failing in my attempts...any thoughts?

Original Expression - great but too slow

Count(DISTINCT aggr(If(Sum(Margin) > 0,Customer), Date, Customer))

Set Analysis Effort - doesn't work as required

Count

({$<"Aggr(Sum(Margin), Date, Customer)" = {">0"}>} DISTINCT Customer)

Regards

John H

johnw · ‎2009-10-27

So you want a count of the customers that had a positive margin on at least one date? I believe set analysis can only be used on actual fields, not on expressions. However, since the double quotes in the set indicate a search, you can put expressions in the search. So this appears to do the trick:

count({<Customer={"=max(aggr(sum(Margin),Date,Customer))>0"}>} distinct Customer)

johnhorneramllp · ‎2009-10-27

John, thanks for your response...you may have confirmed what i was beginning to think, that set analysis only can work on fields and not expressions, especially aggregated expressions. However, although your test does work in your document, if you add the date as a dimension to your shart / straight table it fails as it is not showing the correct number of profitable customers per day.

Regards

John H

johnw · ‎2009-10-27

So far as I know, set analysis expressions always occur outside of the dimensions of a table. So no, you can't add any dimensions to the table and still have the right results if the expression needs to depend on the dimensions. For that, you would need an IF statement, and you're back to your original performance problem.

johnhorneramllp · ‎2009-10-27

ok thanks John, looks like i am getting a similar answer from one of the Qliktech consultants also. It looks like the answer is for me to create an aggregated table for specific chart that i wish to produce and disallow other selections at different levels of granularity.

John H

johnw · ‎2009-10-27

Ugh. I'm just not a fan of aggregated tables, even if it might be necessary. I think one of the strengths of QlikView is that you put in all of your data at the lowest level of detail, and let it roll everything up for you. I hate messing with that basic idea. Maybe I'm afraid of aggregated tables for no reason, but I've avoided them in all of my production applications so far.

Assuming you're actually looking for the count of customers with positive margin by date, here's an expression to do that which avoids both the IF and the DISTINCT, so it might be faster. Maybe if we're lucky it would be fast enough.

-sum(aggr(sum(Margin)>0,Customer,Date))

johnhorneramllp · ‎2009-10-27

John i totally agree i am desperately trying to avoid using aggregates as they really have no place in QlikView and as soon as you have one you seem to need many to cope with all the possible permutations you may need...hence my preference to get some kind of set analysis working. Your example intrigues me as it certainly appears to work but i have no idea how...how does it know to count the customer. I assume that there is some magic (unknown to me) relating to the preceding - sign. It is still a little slow but is faster than previous tests i think (although i have not been scientific about this yet.

Regards

John H

johnw · ‎2009-10-27

Yeah, it's an evil little hack that works by magic. It's taking advantage of the way that true and false are stored, which is -1 for true and 0 for false.

I've added a table to better show what's happening. In the table, I have Date as the first dimension, Customer as the second, and sum(Margin)>0 as the expression. So for each Date and Customer combination, it is checking if the total Margin is positive. If so, it returns true (-1), else it returns false (0). So that's basically what the aggr() is doing, producing that table.

Then, since the actual table only has Date as a dimension, and does a sum() of the results of the aggregation, it will sum up all of the trues (-1s) for each customer for that date. Since the trues represent positive margins, it is thus essentially doing a count of customers with positive margins on that day without actually using count(). The only problem is that the "count" is negative, so we have to fix it by putting a minus sign in front of the whole thing.

I wouldn't typically write an expression this way, because it IS pretty opaque. I'd definitely put comments on it to explain what it is doing if I put something like this in a live application.

And again, I don't know that it will be faster, and I'm kind of doubting that it will be fast enough either way. It's just something I'd try before resorting to aggregation tables.

Related Topics