Skip to main content
hic
Former Employee
Former Employee

The Aggr() functions is one of the most advanced functions in the Qlik engine, and it is not always easy to use. This blog post is about its most common pitfalls.

This is a function that enables the app developer to create nested aggregations, i.e. aggregations in two steps. Essentially the Aggr() is a For-Next loop where each loop contains a measure calculation. When all loops have been performed, a final aggregation based on the array of calculated measures can be made.

If we, for instance, want to calculate the monthly product sales, we could do it by using

Aggr( Sum(Sales), ProductName, OrderMonth )

This will produce a virtual table that looks like

Image1.png

Note that there is exactly one row per distinct combination of the two dimensions.

So, the Aggr() function creates a virtual table; an array of values that need to be aggregated in a second step. Hence you should wrap the Aggr() in an outer aggregation function. For instance, if you want to use the above table to calculate the average monthly sales, you should use

Avg( Aggr( Sum(Sales), ProductName, OrderMonth ))

 

Image2.png

 

With this mental picture of how the function works, we can start looking at the pitfalls.

 

1. Missing inner aggregation function

The first parameter of the Aggr() is a measure, and as such it is always an aggregation. Hence, you should use an aggregation function. If you don’t, the calculation will use Only() and the virtual table may inadvertently get NULLs in the Measure column. See more on Use Aggregation Functions!

 

2. Missing outer aggregation function

The outer aggregation function is needed whenever Aggr() returns more than one value. If no aggregation function is specified, Only() will be used, which may cause your chart measure to contain NULLs.

Note the totals line in the chart above: It shows the average over both products and months. Generally, totals will always be NULL if the outer aggregation is omitted.

 

3. Missing Set Analysis expression

So, there are both inner and outer aggregation functions. Where do I put my Set Analysis expression?

The answer is often “In both”. It may not be enough to have it in only one of the levels.

The current selection (or the relevant selection state) will always affect the aggregation scope of an aggregation function, unless you specify otherwise using Set Analysis. So, if you need Set Analysis you should in most cases have similar or identical Set Analysis expressions in both the inner and outer aggregation function.

 

4. Grain mismatch

When an Aggr() is used in a chart, you have inner dimensions in the Aggr() and outer dimensions in the chart. Then it is important that these match.

The grain of the Aggr() dimensions must be identical or finer than that of the chart dimensions.

For example, an Aggr() with Month as dimension can be put in a chart that has Year as dimension. In the following table, the average monthly sales numbers are correctly calculated, and displayed as an average per year and product.

 

Aggr correct grain.png

 

But the opposite isn’t necessarily possible. If you put an Aggr() with Year as dimension in a chart with Month as dimension, you will get problems. You may get a table that looks like this:

 

Aggr incorrect grain.png

 

Here, the Aggr() function has produced one number per product and year, and this number has been assigned to one single month while the other months contain NULL. This is a result of the grain mismatch and most likely not what you want.

With this, I hope that you have a better understanding of Aggr().

HIC

 

Further reading related to this topic:

AGGR...

When should the Aggr() function NOT be used?

Use Aggregation Functions!

23 Comments
sohailansari201
Creator
Creator

Very informative post HIC. It would be very helpful if you could also talk about using nested Aggr() functions and its implications. For e.g. something like

Count(distinct{$<Active={1}, isBT={0}>} aggr(nodistinct If(

(date(

  max(Aggr(nodistinct max({$<Active={1}>}  if(AddMonths(CalMon...

Thanks.

0 Likes
6,804 Views
christian77
Partner - Specialist
Partner - Specialist

Thanks HIC.

0 Likes
6,804 Views
Not applicable

Thanks Henric, this couldn't have come at a better time for me. I'm trying to build a measure to show the expected number of diseases in a local population with known age distribution. To do this, I multiply the population at each age group by the rates of disease at national level, i.e.

Sum( Aggr( Sum( Population ) * Sum( DiseaseCases_National ) / Sum( Population_National ), AgeGroup )

To make this work on a chart with drilldowns for both location and disease type dimensions, I've had to add both the Location grain and the DiseaseType grain to the AgeGroup, which slowed things down significantly.

Any workaround or optimization tips you can provide would be much appreciated.

Dannie

0 Likes
6,694 Views
hic
Former Employee
Former Employee

Do you really need Aggr() for this? I would just use

      Sum( Population ) * Sum( DiseaseCases_National ) / Sum( Population_National )

and then add the appropriate dimensions in the chart, e.g:

      Location, DiseaseType, AgeGroup


If this suggestion isn't the answer, I suggest you open a separate thread for your question.


HIC

0 Likes
6,694 Views
Not applicable

Yes I think I'll open a thread for this. The minimum requirement is to adjust (or weight) the expected cases by age, since disease rates are higher in the elderly and some locations have higher proportions of elderly. However I'd like to provide location and disease type as sliceable (and collapsible) dimensions, so the same measure (expected cases) can be used on multiple charts with different location/disease type selections.

D

0 Likes
6,694 Views
mgranillo
Specialist
Specialist

Henric,

Can you explain the processing steps of this aggr example:

I have the result I want, I just don't fully understand how Qlik is working.

It's my goal to count the new distinct names per month. So month 1 should return a count of 5 (all names are new), month 2 should return a count of 2 (because F and G are the newly added names), and month 3 should be 1.

Sample Data:

LOAD * INLINE [

Month, Name

1,A

1,B

1,C

1,D

1,E

2,F

2,G

2,A

2,B

2,C

2,D

3,H

3,A

3,B

3,C

];

I've applied the following dimension and expression to a table:

Dimension: Month

Expression:

sum(

        aggr(

            count(distinct  "Name")

             ,"Name")

           )

Result:

2016-06-23 15_15_18-Qlik Sense Desktop.jpg

It's my understanding from your explanation, that the aggr function will return a table like this:

                 

Namecount(distinct  "Name")
A1
B1
C1
D1
E1
F1
G1
H1

But how does the Month dimension get applied to this table to return a distinct count in the final visual?

0 Likes
6,694 Views
hic
Former Employee
Former Employee

This is an example of a Grain Mismatch. You display the result of the Aggr() function in a chart that has a dimension that is missing inside the Aggr().

I would be very careful here - Grain Mismatches are unpredictable. Basically, what happens is that the first Month found inside the Aggr bin is used for the entire bin. So it is highly dependent on the load order. I would try to create an expression that does not depend on the initial load order.

HIC

0 Likes
6,694 Views
hic
Former Employee
Former Employee

Or to illuminate this further: If you create an app with just one table:

Load * Inline
[Month,Product,OrderID
1,A,4
2,A,1
2,B,2
3,A,3
3,B,5
1,B,6
]
;

Aggr Test 0.png

and then analyse this information using a similar expression

Sum(Aggr(Count(OrderID),Product))

you will find that the result depends on the order of the source records: Since 'A' is first found in Month 1, all 'A' orders are attributed to Month 1. Similarly all 'B' orders are attributed to Month 2. Change the order of the records, and you'll see.

Aggr Test.png

HIC

6,694 Views
mgranillo
Specialist
Specialist

Thank you Henric for the explanation. This is very helpful.

Mike

0 Likes
6,489 Views
daniel_wennstro
Partner - Contributor II
Partner - Contributor II

Hi @hic ,

I have a case where I get issues with the "4 - Grain Missmatch" but i partly want to have it:

The customer has a lot of agreements, and some agreements applies to more than one product group. Now they want to analyze how many of their Agreements that applies to multiple Product Groups and also see how much of the Agreement amount that applies to each Product Group for those agreements.

I tried with an expression like:

sum(aggr(if(count(Distinct ProductGroup)>1,1),Agreement))

That works fine on top level:

daniel_wennstro_0-1587978613301.png

It also works fine when i expand to see which agreements this is:

daniel_wennstro_1-1587979651696.png

But, when I expand to see the product Groups that are associated to each agreement i get the error you write about with mismatching Grain. 

However I would like to see that, and i would also like to see the Agreement amount for each Product Group included in the Agreements that has more than one Product Group.

I have also thought about doing this in the script, but then too much flexibility will be lost for the end users.

Do you have any idea how to solve this?

0 Likes
1,282 Views