Qlik Community

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Employee
Employee

Pitfalls of the Aggr function

The Aggr() functions is one of the most advanced functions in the QIX engine, and it is not always easy to use. This blog post is about its most common pitfalls.

This is a function that enables the app developer to create nested aggregations, i.e. aggregations in two steps. Essentially the Aggr() is a For-Next loop where each loop contains a measure calculation. When all loops have been performed, a final aggregation based on the array of calculated measures can be made.

If we, for instance, want to calculate the monthly product sales, we could do it by using

Aggr( Sum(Sales), ProductName, OrderMonth )

This will produce a virtual table that looks like

Image1.png

Note that there is exactly one row per distinct combination of the two dimensions.

So, the Aggr() function creates a virtual table; an array of values that need to be aggregated in a second step. Hence you should wrap the Aggr() in an outer aggregation function. For instance, if you want to use the above table to calculate the average monthly sales, you should use

Avg( Aggr( Sum(Sales), ProductName, OrderMonth ))

Image2.png

With this mental picture of how the function works, we can start looking at the pitfalls.

1. Missing inner aggregation function

The first parameter of the Aggr() is a measure, and as such it is always an aggregation. Hence, you should use an aggregation function. If you don’t, the calculation will use Only() and the virtual table may inadvertently get NULLs in the Measure column. See more on Use Aggregation Functions!

2. Missing outer aggregation function

The outer aggregation function is needed whenever Aggr() returns more than one value. If no aggregation function is specified, Only() will be used, which may cause your chart measure to contain NULLs.


Note the totals line in the chart above: It shows the average over both products and months. Generally, totals will always be NULL if the outer aggregation is omitted.

3. Missing Set Analysis expression

So, there are both inner and outer aggregation functions. Where do I put my Set Analysis expression?

The answer is often “In both”. It may not be enough to have it in only one of the levels.

The current selection (or the relevant selection state) will always affect the aggregation scope of an aggregation function, unless you specify otherwise using Set Analysis. So, if you need Set Analysis you should in most cases have similar or identical Set Analysis expressions in both the inner and outer aggregation function.

4. Grain mismatch

When an Aggr() is used in a chart, you have inner dimensions in the Aggr() and outer dimensions in the chart. Then it is important that these match.

The grain of the Aggr() dimensions must be identical or finer than that of the chart dimensions.

For example, an Aggr() with Month as dimension can be put in a chart that has Year as dimension. In the following table, the average monthly sales numbers are correctly calculated, and displayed as an average per year and product.

Aggr correct grain.png

But the opposite isn’t necessarily possible. If you put an Aggr() with Year as dimension in a chart with Month as dimension, you will get problems. You may get a table that looks like this:

Aggr incorrect grain.png

Here, the Aggr() function has produced one number per product and year, and this number has been assigned to one single month while the other months contain NULL. This is a result of the grain mismatch and most likely not what you want.

With this, I hope that you have a better understanding of Aggr().

HIC

Further reading related to this topic:

AGGR...

When should the Aggr() function NOT be used?

Use Aggregation Functions!

19 Comments

Thank you HIC.

0 Likes
318 Views
christian77
Valued Contributor

What about the NODISTINCT modifier?

Thanks.

0 Likes
318 Views
Employee
Employee

In most cases you should not use the NoDistinct qualifier.

However, if you have a grain mismatch and you know that you really want to have a grain mismatch, you should turn on NoDistinct. It usually produces the correct result. In the above case, you would get the yearly number assigned to all months, not just to one month.

HIC

0 Likes
318 Views
martyn43
New Contributor II

Another clearly presented informative article

It would be useful to see the equivalent set analysis syntax, perhaps ogether with a couple of more complex set analysis examples.

Thanks again.

MK

0 Likes
318 Views
Employee
Employee

The Aggr() function and Set Analysis are fundamentally different. You cannot substitute one with the other, so there is no "equivalent" Set Analysis syntax.

  • Aggr() (without Set analysis) is a way to make a nested aggregation using the record set defined by the current selection.
  • Set analysis (without Aggr()) is a way to make a simple aggregation using a different record set than what is defined by the current selection.

But a blog post with more advanced Set Analysis could still be a good idea...

HIC

0 Likes
318 Views
ecolomer
Honored Contributor II

Thank's HIC

0 Likes
318 Views
nareshguntur
Valued Contributor

Thank You HIC.

A clear insight as usual.

Naresh

0 Likes
318 Views
rva_heldendaten
Contributor III

Hi Henric!

Any chance to sort the values in an aggr-Dimension? When I do something like

concat(distinct Year& 'Performance: '&

   aggr(nodistinct

     sum(Sales)/above(sum(Sales)),

  Year)

,chr(10))

I would like to sort the Field "Year" ASC (or DESC).

Best regards,

Roland

0 Likes
318 Views
jason_michaelid
Honored Contributor II

The third parameter of Concat() deals with sort orders, so try:

concat(distinct Year& 'Performance: '&

aggr(nodistinct

sum(Sales)/above(sum(Sales)),

Year)

,chr(10)

, aggr(nodistinct

sum(Sales)/above(sum(Sales)),

Year)

Put a “-“ before the aggr to sort descending instead of ascending.

Hope this helps,

Jason

0 Likes
318 Views
Employee
Employee

Unfortunately not. However, we have internally recognized the need to sort the output of the Aggr(), so it is on my wishlist, as well as on yours...

HIC

0 Likes
318 Views
sohailansari201
Contributor

Very informative post HIC. It would be very helpful if you could also talk about using nested Aggr() functions and its implications. For e.g. something like

Count(distinct{$<Active={1}, isBT={0}>} aggr(nodistinct If(

(date(

  max(Aggr(nodistinct max({$<Active={1}>}  if(AddMonths(CalMon...

Thanks.

0 Likes
318 Views
christian77
Valued Contributor

Thanks HIC.

0 Likes
318 Views
Not applicable

Thanks Henric, this couldn't have come at a better time for me. I'm trying to build a measure to show the expected number of diseases in a local population with known age distribution. To do this, I multiply the population at each age group by the rates of disease at national level, i.e.

Sum( Aggr( Sum( Population ) * Sum( DiseaseCases_National ) / Sum( Population_National ), AgeGroup )

To make this work on a chart with drilldowns for both location and disease type dimensions, I've had to add both the Location grain and the DiseaseType grain to the AgeGroup, which slowed things down significantly.

Any workaround or optimization tips you can provide would be much appreciated.

Dannie

0 Likes
318 Views
Employee
Employee

Do you really need Aggr() for this? I would just use

      Sum( Population ) * Sum( DiseaseCases_National ) / Sum( Population_National )

and then add the appropriate dimensions in the chart, e.g:

      Location, DiseaseType, AgeGroup


If this suggestion isn't the answer, I suggest you open a separate thread for your question.


HIC

0 Likes
318 Views
Not applicable

Yes I think I'll open a thread for this. The minimum requirement is to adjust (or weight) the expected cases by age, since disease rates are higher in the elderly and some locations have higher proportions of elderly. However I'd like to provide location and disease type as sliceable (and collapsible) dimensions, so the same measure (expected cases) can be used on multiple charts with different location/disease type selections.

D

0 Likes
318 Views
kidbank3
Contributor III

Henric,

Can you explain the processing steps of this aggr example:

I have the result I want, I just don't fully understand how Qlik is working.

It's my goal to count the new distinct names per month. So month 1 should return a count of 5 (all names are new), month 2 should return a count of 2 (because F and G are the newly added names), and month 3 should be 1.

Sample Data:

LOAD * INLINE [

Month, Name

1,A

1,B

1,C

1,D

1,E

2,F

2,G

2,A

2,B

2,C

2,D

3,H

3,A

3,B

3,C

];

I've applied the following dimension and expression to a table:

Dimension: Month

Expression:

sum(

        aggr(

            count(distinct  "Name")

             ,"Name")

           )

Result:

2016-06-23 15_15_18-Qlik Sense Desktop.jpg

It's my understanding from your explanation, that the aggr function will return a table like this:

                 

Namecount(distinct  "Name")
A1
B1
C1
D1
E1
F1
G1
H1

But how does the Month dimension get applied to this table to return a distinct count in the final visual?

0 Likes
318 Views
Employee
Employee

This is an example of a Grain Mismatch. You display the result of the Aggr() function in a chart that has a dimension that is missing inside the Aggr().

I would be very careful here - Grain Mismatches are unpredictable. Basically, what happens is that the first Month found inside the Aggr bin is used for the entire bin. So it is highly dependent on the load order. I would try to create an expression that does not depend on the initial load order.

HIC

0 Likes
318 Views
Employee
Employee

Or to illuminate this further: If you create an app with just one table:

Load * Inline
[Month,Product,OrderID
1,A,4
2,A,1
2,B,2
3,A,3
3,B,5
1,B,6
]
;

Aggr Test 0.png

and then analyse this information using a similar expression

Sum(Aggr(Count(OrderID),Product))

you will find that the result depends on the order of the source records: Since 'A' is first found in Month 1, all 'A' orders are attributed to Month 1. Similarly all 'B' orders are attributed to Month 2. Change the order of the records, and you'll see.

Aggr Test.png

HIC

318 Views
kidbank3
Contributor III

Thank you Henric for the explanation. This is very helpful.

Mike

0 Likes
318 Views