Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

need help on a nested aggregation

hi all,

I need some help for an expression involving nested aggregation.

I have some sale data with sales for different countries. Fields are COUNTRY and SALE

I am building a straight table graph  with average sales per country. No Pb. Dimension is COUNTRY and expressions are :

* AVG_SALE = avg(sale)

* STDEV_SALE = stdev(sale)

output then looks like

country    avg_sale       stdev_sale

FR         12             3

UK         15             4

...

Now I need to add a column CAP_AVG_SALE that would display the average of sales capped to (avg_sale+stdev_sale) per country, i.e if sale value exceeds the average+standard deviation for the country, we retain (avg_sale+stdev_sale).

The output should then look like

country    avg_sale       stdev_sale   cap_avg_sale

FR         12             3            11.7

UK         15             4            14.6

...

My first try was to use an expression like

avg( RangeMin (SALE, avg(TOTAL SALE) + Stdev(TOTAL SALE)))

The problem with this is that the TOTAL is applied on the COUNTRY dimension so the average avg(TOTAL SALE) is the global average, but what I need in the nested avg() is the average per country

So I would need something more like

avg( RangeMin (SALE, avg({<COUNTRY={current country in the table row}>} TOTAL SALE) + Stdev({<COUNTRY={current country in the table row}>} TOTAL SALE)))

right now I cannot fin a formula to get the country from the current table row so I'm a bit stuck

so my question is :

1) Is there maybe a staistical function that I could use and does exactly what I need so I don't need to do these nested aggregations ?

2) if not, which expression could you advise ?

3) In my actual graph, COUNTRY is actually a dimension group. Hopefully the solution should also work with that

thanks a lot

Philippe

8 Replies
Gysbert_Wassenaar

For nested aggregations you need to use the aggr function:

avg(aggr( RangeMin (SALE, avg(SALE) + Stdev(SALE))),country))


But since country is a group and not a field you need to use the getcurrentfield function:


avg(aggr( RangeMin (SALE, avg(SALE) + Stdev(SALE))), $(=getcurrentfield[country])) ))


talk is cheap, supply exceeds demand
Not applicable
Author

Hi Gysbert,

thanks for the prompt answer, unfortunately it doesn't behave as expected : the expression output is the addition of avg_sale+stdev_sale. I think it's because the SALE that is compared to avg(SALE) in the RangeMin() is inside the aggr() ?

So it seems we would like to use the aggr() only for the inner avg() and stdev() like in

avg(RangeMin (sale, aggr(avg(sale) + Stdev(sale), country)))

which doesn't work because I think the aggr is an internal table of several lines when we need one line for the current country, hence my first idea of a set analysis ....

I enclose a basic test projet

best regards

Phlippe

Gysbert_Wassenaar

Just to check, can you take a look at the second (lower) straight table in the attached qvw. Are the cap_stdev_sale values in that chart correct?


talk is cheap, supply exceeds demand
Not applicable
Author

here is the file with the correct values. I have modified slightly the data samples as there were duplicates, and have modified your join which was a full cartesian

thanks

Gysbert_Wassenaar

Ok, in case you want to do this without precalculating averages and stdevs in the script you can try this expression: avg(aggr(rangemin(only(sale),avg(total <country> sale) + stdev(total <country> sale)), country,sale))


talk is cheap, supply exceeds demand
Not applicable
Author

that's correct !!

if you have time, could you give a quick explanation for the formula, it's still not cristal clear to me

also, how to use the dimension group instead of <country>

thanks !

Gysbert_Wassenaar

The rangemin function should compare the sales values at the lowest aggregation level, i.e. country,sale. That's why the the outer avg is aggregating over the dimensions sale and country: avg(aggr(something),country,sale)). The rangemin should compare the lowest level sale with the avg and stdev of sale at the country level. That's why those get the total <country> modifier.


talk is cheap, supply exceeds demand
Not applicable
Author

actually after adding a new dimension "product" and doing more tests, I had some discrepencies and I finally found out the the aggr () had to be nested inside the rangemin() :

avg(rangemin(sale, aggr(avg(total <country> sale)

   + stdev(total <country> sale), country, sale)))

since my dimension is in a group, the final formula is :

avg(rangemin(sale, aggr(avg(total <$(=getcurrentfield("group1"))> sale)

   + stdev(total <$(=getcurrentfield("group1"))> sale), $(=getcurrentfield("group1")), sale)))