Help with a Ranking Formula - Expert Advice Needed - Qlik Community

Report Inappropriate Content · ‎2010-06-10

Hi Folks

I have the following formula that provides me with a Top X, Bottom X and Others split in a pivot table, the formula is a ranking formula using an aggregation over two cyclic groups.

In Version 8.5, whilst the calculation is not so quick, it does work perfectly.

In Version 9.0 SR4, this formula sends the CPU on the server to 100%, it dramatically increases the RAM usage (this is when only 1 user is logged on and using QlikView), and it can take 1 or 2 minutes to calculate and produce a result. On my local machine the chart (pivot table) fails to calculate at all.

I'm wondering if there might be a different way to achieve the same thing as this problem is going to stop Version 9 being released to production:

Here's the wonderful formula for your enjoyment:

=if(
 aggr(
 rank(
 Sum ( _rFMV )
 -
 Sum ( { $ < DT_ReportDate={"$(vCompDate)"},DT_ComparisonDate={"*"} > } _rFMV )
 ,2
 )
 ,XXX_CyclicGroup_LPCharts1,XXX_CyclicGroup_LPCharts2) <= YYY_LPCharts_TopBottomN

 ,'A Top Increase ' &
 Num (
 Aggr (
 Rank (
 Sum ( _rFMV )
 -
 Sum ( { $ < DT_ReportDate={"$(vCompDate)"},DT_ComparisonDate={"*"} > } _rFMV ) ,2)
 ,XXX_CyclicGroup_LPCharts1,XXX_CyclicGroup_LPCharts2
 )
 ,000,
 ),


 if(
 aggr(
 rank(
 -Sum ( _rFMV )
 +
 Sum ( { $ < DT_ReportDate={"$(vCompDate)"},DT_ComparisonDate={"*"} > } _rFMV )
 ,2
 )
 ,XXX_CyclicGroup_LPCharts1,XXX_CyclicGroup_LPCharts2) <= YYY_LPCharts_TopBottomN

 ,'C Top Decrease ' &
 Num(
 YYY_LPCharts_TopBottomN
 + 1
 - Aggr (
 Rank (
 - Sum ( _rFMV )
 +
 Sum ( { $ < DT_ReportDate={"$(vCompDate)"},DT_ComparisonDate={"*"} > } _rFMV )
 ,2)
 ,XXX_CyclicGroup_LPCharts1,XXX_CyclicGroup_LPCharts2)
 ,000),

 'B Other'
))

Thanks for any help,

Oleg_Troyansky · ‎2010-06-10

Nigel,

the formula is wonderful, indeed, and I can see how it might be throwing QlikView for a loop... I'm even questioning why it works...

Your formula begins with IF( AGGR(... )) . My understanding of how AGGR works is that it creates a virtual array of aggregated values associated with the aggregation dimensions. Being an Array, the result of AGGR can be used as a Calculated Dimension, or it can be used inside another aggregation functions (sum, min, max, etc...) in order to produce a single result.

In your expression, your are testing the value of an array (AGGR), and I can't understand how it should work...

Anyway, my misunderstanding aside... What you can try to do is to avoid calculating the same results multiple times by extracting pieces of your expression into another hidden expression. This way, you can at least apply rank() and aggr() once and then reuse the result multiple times.

You are right, however, - there must be a better way altogether. I just cant think of one now...

cheers,

Ask me about Qlik Sense Expert Class!

johnw · ‎2010-06-10

Oleg, I'm guessing the if() test of the aggr() works because he's using XXX_CyclicGroup_LPCharts1 and XXX_CyclicGroup_LPCharts2 not just as his aggr() dimensions, but also as his chart dimensions. So I suspect he's building an internal table to extract the rank, and then the chart dimensions pick the correct rank from this internal table. But if that's the case, I wouldn't think we'd need the aggr at all, so that would be one simplification.

Nigel, despite the length of the thing, what you're attempting seems fairly straightforward:

if(your increase is ranked high, 'top increase ' & rank
,if(your decrease is ranked high, 'top decrease ' & rank
, 'other'))

I can see how this could be slow, but not how it could be massively slower in version 9 without version 9 having some bugs. I do think it could be simplified, though. One thing to recognize is that a set analysis set is evaluated once for your entire chart, ignoring the dimension values. Therefore, this part of your expression:

Sum ( { $ < DT_ReportDate={"$(vCompDate)"},DT_ComparisonDate={"*"} > } _rFMV )

I THINK will always return the same value, regardless of which dimensions you've chosen and what their value is for the current row of the chart or for the current "row" of the aggr(). Is that true?

If so, then I don't think it can have any effect on your ranking. All you do is subtract or add this. Subtracting or adding a "constant" (constant across your chart and aggr() dimensions, anyway) will not affect the ranking.

If I'm right about all that, then all you really need to rank is sum(_rFMV) and -sum(_rFMV), NOT the increase or decrease, because the increase and decrease are measured from a fixed value across your dimensions. With any luck, the performance problem is being caused by some interaction with the set analysis, and perhaps this would fix it.

Also, are you sure you want to reverse the bottom rankings like that? Wouldn't you want to call the worst decrease 'C Top Decrease 1' instead of 'C Top Decrease 10'? Maybe I'm misunderstanding what that part is doing, though.

I also agree with Oleg that you should extract the repetition to hidden dimensions.

Put it all together, and I'd try something like this:

Increase Rank = aggr(rank( sum(_rFMV),2),XXX_CyclicGroup_LPCharts1,XXX_CyclicGroup_LPCharts2) // hidden
Decrease Rank = aggr(rank(-sum(_rFMV),2),XXX_CyclicGroup_LPCharts1,XXX_CyclicGroup_LPCharts2) // hidden
Final Rank = if("Increase Rank" <= YYY_LPCharts_TopBottomN
,'A Top Increase ' & num("Increase Rank",'000')
,if("Decrease Rank" <= YYY_LPCharts_TopBottomN
,'C Top Decrease ' & num("Decrease Rank",'000')
,'B Other'))

And maybe even this:

Increase Rank = rank( sum(_rFMV),2) // hidden
Decrease Rank = rank(-sum(_rFMV),2) // hidden
Final Rank = if("Increase Rank" <= YYY_LPCharts_TopBottomN
,'A Top Increase ' & num("Increase Rank",'000')
,if("Decrease Rank" <= YYY_LPCharts_TopBottomN
,'C Top Decrease ' & num("Decrease Rank",'000')
,'B Other'))

I could be wrong in some of my assumptions, though. And even if I got them right, it doesn't mean this will help.

EDIT: Bah, I'm being stupid about the set analysis part. Yes, it will only generate one set. But you're still only going to get the sum for the current dimension values, because it's going to split the set up by your dimensions. In the set analysis itself, I assume vCompDate is just a single date, in which case you could use a literal instead of a search expression. You can also simplify the syntax for "ignore this field". So while I can't see why it would execute much more quickly, maybe this?

Increase Rank = rank( sum(_rFMV)-sum({<DT_ReportDate={'$(vCompDate)'},DT_ComparisonDate=>} _rFMV),2) // hidden
Decrease Rank = rank(-sum(_rFMV)+sum({<DT_ReportDate={'$(vCompDate)'},DT_ComparisonDate=>} _rFMV),2) // hidden
Final Rank = if("Increase Rank" <= YYY_LPCharts_TopBottomN
,'A Top Increase ' & num("Increase Rank",'000')
,if("Decrease Rank" <= YYY_LPCharts_TopBottomN
,'C Top Decrease ' & num("Decrease Rank",'000')
,'B Other'))

Report Inappropriate Content · ‎2010-06-11

Hi Oleg & John

Not wishing to blow too much smoke up your backsides, when I wrote this one you are two of the three that I was hoping would jump on it, so thanks for that, appreciate the input.

I have now put together a very simple model that contains just 7 rows of information, it uses two cyclic groups as per my live example and I've implemented John's suggestion at the bottom of his response. In a model with only 7 rows of data I am getting an "allocated memory exceeded" error which is a little worrying, so I must be barking up the wrong tree here, or I'm just barking mad!

What I need is indeed pretty simple, I need to show the top 5, the bottom 5, and the others.

They need to be in a pivot table because the end-users would wish to be able to expand/collapse those groups.

The set analysis part (I've removed that from my simpkified model) is required because the actual requirement is to rank on movement of values, so the formula of Sum(_rFMV) results in the calculation of FMV (Fair Market Value) for the currently selected period, then the set analysis part calculates the sum of FMV for the Comparison Date which will be an earlier quarter end. So, first minus second gives me the movement of FMV between the two dates.

The attached shows several problems:

1. The individual formulas for Rank show an error and I can't figure out why (and this is why historically I think I introcued the AGGR)

2. I can't figure out how to hide a dimension in a pivot table (and it must be a pivot table)

3. The table will not calculate at all in V9 (allocated memory exceeded)

4. The table does calculate in V8.5 (but with formula errors)

If either/both of you can be of any more help I would appreciate it.

Cheers,

Nigel.

Report Inappropriate Content · ‎2010-06-11

Hi Again

This new example re-introduces the AGGR function into the Rank formulas, and the formula now calculates, but doesn't produce the correct results (all 1), also, the calculated dimension that is trying to assess the value of "Top" and/or "Bottom" is not working at all.

?????

Report Inappropriate Content · ‎2010-06-11

An update for you both.......

I sent the specific object in a document to QlikTech and they have come back and confirmed that they tested it in V8.5 and had no problems, they tested it in V9 and it crashed their machine!

So, it would certainly appear that there are some major differences between 8.5 and 9.0 in this area, I'll wait for some feedback from them but I'm still trying to come up with a better solution to the problem (and that includes going back to the business to tell them they can't have the pivot table).

Cheers & have a good weekend,