Aggregations and Function Classes - Qlik Community

hic · ‎2014-05-20

A typical QlikView application may have one million records in the data, one hundred rows in a pivot table and a single number, a KPI, in a gauge or text box. Although different in magnitudes, all three numbers may still represent all data. The numbers are just different aggregation levels.

There are many functions in QlikView that can help you write the necessary formulae to calculate aggregated KPI:s. Some will collapse many records into one value, others will not. Today I will write about the different function classes, and how you can combine them.

The Scalar Functions constitute the first class. Typical for these is that they are one-to-one functions, i.e. they take a single value as parameter and return a single value (of the dual data type). Examples: Left(), If(), Num(), Date(), Year(), Subfield(), etc.
The Aggregation Functions constitute the second class. These are many-to-one functions, i.e. they use the values from many records as input and collapse these into one single value that summarizes all records. Examples: Sum(), Count(), Avg(), Min(), Only(), Concat(), etc.

Aggregation functions are special: You must use one to collapse several records into one number – which means that you need them in pretty much any formula in QlikView: In Chart expressions, in Text boxes, in Labels, etc. If you don’t write an aggregation function in your expression, QlikView will assign one for you: It will use the Only() function.

Scalar functions can be used both inside and outside the aggregation function:

Date( Min( Date )

Money( Sum( If( Group='A', Amount ) ) )

There is one restriction: You can normally not use an aggregation function inside another aggregation function. Hence, you usually need every field reference to be wrapped in exactly one aggregation function.

The next function class has only one member: The Aggr Function. It is – in spite of its name – not an aggregation function. It is a many-to-many function, rather like a tensor or a matrix in mathematics. It converts a table with N records to a table with M records. In other words: It returns an array of values. Regard it as a virtual straight table with one measure and one or several dimensions.

Most places in QlikView demand that you write your expression so that it returns one single value. This means that you must wrap the Aggr function in an aggregation function to get a meaningful result. The only exception is if you use the Aggr function to define a calculated dimension or field. This means that you have two aggregation steps; one nested in the other:

Avg( Aggr( Sum( Amount ), Month ) )

Charts complicate the matters slightly: A chart is like a For-Next loop where the number of distinct dimension values determines the number of loops. In each loop, the expression must return one value only, and this is the value used for the bar/slice/pivot table row.

However, sometimes you need values from other rows in the chart, and it could even be that you need values from several rows. To solve this, there are two additional classes of functions that should be used together:

The Chart Inter-record Functions return values fetched from other rows in the chart. Some of these can return several values, i.e. an array of values. These functions are only meaningful inside a chart or Aggr() function. Examples: Above(), Below(), Top(), etc.
The Range Functions are functions that can collapse a chart inter-record array into one single value. Examples: RangeSum(), RangeMin(), RangeMax(), etc.

Example:

Legend 3XS.png

RangeSum( Above( Sum( Amount ), 0, 12 ) )

Bottom line: Know your functions. It will help you write correct expressions.

HIC

See also

It’s all Aggregations

Dimensions and Measures

The Above Function

Report Inappropriate Content · ‎2014-05-20

Excellent & clear.

Thanks : you made things clear that were not really in my mind.

Fabrice

luciancotea · ‎2014-05-20

Great post, Henric!

Could you explain more about aggr() function, how it works when we use:

Avg( Aggr( Sum(DISTINCT Amount ), Month ) )

or

Avg( Aggr(DISTINCT Sum( Amount ), Month ) )

or

Avg(DISTINCT Aggr( Sum( Amount ), Month ) )

Also between:

Avg( Aggr( Sum({<Year={2013}>} Amount ), Month ) )

and

Avg( Aggr({<Year={2013}>} Sum( Amount ), Month ) )

and

Avg({<Year={2013}>} Aggr( Sum( Amount ), Month ) )

or different combinations?

Report Inappropriate Content · ‎2014-05-20

Your bottom line says it all. The moment the workings of the aggr function became clear to me, a whole new world opened up. Good article and clear description!

Joaquin_Lazaro · ‎2014-05-20

Hi Henric:

It's a good post.

I think I've understood the basic of the AGGR function, but the Inter-record functions and Range Funcionts are completely new for me.

Do you have a blog, technical brief, white paper or something that explains this functions?

Thank you,

Joaquín

antoniotiman · ‎2014-05-20

Thanks Henric,

can We simply assume that Aggr() is "Group By" ?

Regards

hic · ‎2014-05-20

@ Joaquin Lazaro No, I have no good white paper about Inter-record functions. I should probably write one...

hic · ‎2014-05-20

@ ANTONIO MANCINI

Yes, you can compare it to a Group by. In principle, the following two return exactly the same ntuple:

Aggr( Sum(Amount), Month )

SELECT Sum(Amount), Month FROM ... GROUP BY Month

The difference is that the field "Month" is visible in the SELECT, whereas it is hidden in the Aggr().

HIC

Joaquin_Lazaro · ‎2014-05-20

Thanks Henric Cronström

I will wait for it

Report Inappropriate Content · ‎2014-05-20

I'll try to answer this one.

Avg( Aggr( Sum(DISTINCT Amount ), Month ) )

Will sum only distinct Amounts per Month and then average those per-Month. So if in January you have 100, 100 and 200 as Amounts, Sum(Distinct Amount) gives 100+200=300 rather than 100+100+200=400. The list of such sums for each month are averaged.

Avg( Aggr(DISTINCT Sum( Amount ), Month ) )

The aggr() function uses DISTINCT by default. So this is the same as the expression that HIC mentioned. If you say NODISTINCT, then for each row that contains the Month field, you will get a unique value in the list of values returned by the Aggr() function.

Avg(DISTINCT Aggr( Sum( Amount ), Month ) )

This will get the list of values per Month and then ignores duplicates in that list.

Avg( Aggr( Sum({<Year={2013}>} Amount ), Month ) )

For each Month that is "white" using the current selections, find the sum of Amount as if you had also selected Year=2013 instead of whatever is actually selected. Then average.

Avg( Aggr({<Year={2013}>} Sum( Amount ), Month ) )

I don't think this works. Set Analysis only makes sense in aggregation functions.

Avg({<Year={2013}>} Aggr( Sum( Amount ), Month ) )

Take the current selections and make Year=2013 instead of whatever it is. If you have to change selections on something else to do it, then we get an empty data set maybe. Then, for each Month that is white in that result, calculate the Sum(Amount) as if Year wasn't set to 2013.

Hope that helps!

Michael

luciancotea · ‎2014-05-20

Thanks Michael, but I wanted to know how qualifiers and set expressions (and combinations) affect in general the aggr() function. Maybe some optimization tips... Sorry if I wasn't clear enough.

I know that DISTINCT doesn't make sense in aggr() because aggregation returns distinct results by definition. The given examples were just to illustrate the idea.