Qlik Community

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Employee
Employee

Use Aggregation Functions!

On the discussion forum, I often see people posting questions around expressions that don’t work. When looking at the descriptions, I usually find that the reason is that the expressions lack aggregation functions. So, here is a suggestion...

Always use an aggregation function in your expression.

The reason is that a field reference in an expression always means an array of values. Which in turn means that you must enclose it in an aggregation function to make it collapse into one value:

      OrderDate             An array of values

      Max(OrderDate)     A single value

If you don't use an aggregation function, QlikView will use the Only() function. Hence, if the field reference returns several values, QlikView will interpret it as NULL, and the expression will not be evaluated the way you want it to.

Example 1: Use of the If() function:

If() functions are often used for conditional aggregations:

      If( OrderDate >= vReferenceDate, Sum(Amount) )

At first glance, this expression may look correct: For dates after a reference date, the field Amount should be summed. Right?

Wrong.

OrderDate is a naked field reference: It does not have an aggregation function. Hence, it is an array, possibly with several values, and if so, evaluates to NULL. If you are lucky, there is only one date per dimensional value in your chart, and the expression will calculate fine. However, QlikView will probably not be able to calculate the expression for the subtotals in the chart, since there for those exists several dates.

A correct expression that always works should use a Min() or some other aggregation function in the first parameter of the If() function:

      If( Min(OrderDate) >= vReferenceDate, Sum(Amount) )

Or, alternatively, the If() function should be put inside the Sum() function:

      Sum( If(OrderDate >= vReferenceDate, Amount) )

In the first of the two expressions, the If() function will be evaluated once per dimensional value; in the second once per row in the raw data. The results are slightly different, but both return an answer, as opposed to the original expression. The picture below shows the difference between the expressions, using 2013-02-01 as reference date.

If function.png

Example 2: Sort by expression:

The expression used to sort the dimensional values in a chart is also an aggregation. Often you don’t think about this since you choose an expression that returns just one value per dimensional value, and then a naked field reference works fine.

But sometimes this still doesn't work…

For example, say that you want to show support cases in a CRM system. You create a chart with the support case as dimension and some measure as expression. Of course you want to sort the support cases chronologically, so you use "Sort by Expression" and as expression you choose

      [Opening Date]

This will work in most cases. However, some CRM systems allow you to re-open a support case, hence assigning two opening dates to one single support case.  For these cases, the above expression will not work.

Instead, you should always ask yourself which function to use, should there be two values. The answer is usually Sum(), Avg(), Min() or Max(). In the above case, you should use

      Min([Opening Date]) , or

      Max([Opening Date])

depending on whether you want to use the first or last date.

Bottom line: Use aggregation functions, not just in your chart measures, but also in sort expressions, labels, show conditions, calculation conditions, text boxes, sheet names and searches.

HIC

Further reading related to this topic:

It’s all Aggregations

Aggregations and Function Classes

40 Comments
Not applicable

Thank you very much! Made some things clearer to me!

0 Likes
247 Views
satyadev_j
Valued Contributor

Well articulated. Thanks for sharing!

0 Likes
247 Views
eruditio
Contributor III

School of Qlik thanks!!

0 Likes
247 Views
datanibbler
Esteemed Contributor

Hi Henric,

that is a very valuable post indeed. I haven't read it in full yet, but I will copy it to Word, make into an epub and read it on the train - for plain reading, I prefer my eReader which means epub is the best format available. I cannot count the times we have already come across a display not showing something just because there were several values in that field acc. to the dimensions we had and thus QlikView would not show it.

I have made quite good experiences (in straight-table charts I use to show details, that is) with using a line_id (rowno()) as additional dimension and then hiding it (on the "presentation" tab). That way, there is only one line per line (logical, no?) and no details are lost.

Best regards,

DataNibbler

0 Likes
247 Views
Not applicable

Henric,

I hear the sound of a thousand palms slapping into a thousand foreheads, as people read the first section, and the realisation dawns.  I quickly checked a couple of my documents, where I had some "issues" and there it was the lesser spotted NFR.

Thanks

0 Likes
247 Views
fdelacal
Valued Contributor

Thanks for share!!

0 Likes
247 Views
MCampestrini
Valued Contributor

HIC

Thanks for clarifying that. There really are some points that are not usually taken into account in developing applications

0 Likes
247 Views
giakoum
Honored Contributor II

Good article indeed!

But should we not point out to people, especially beginners, that expressions like

If( Min(OrderDate) >= vReferenceDate, Sum(Amount) )

Sum( If(OrderDate >= vReferenceDate, Amount) )

are not a good idea to use, and set analysis should be used instead?

Thank you, Ioannis.

247 Views
Not applicable

Ioanis

I think beginners would understand

If( Min(OrderDate) >= vReferenceDate, Sum(Amount) )

Rather than:-

Sum ({$<OrderDate = {'>=$(vReferenceDate)}'} Amount))

(I suspect I also got that Set analysis wrong, as I did it from memory)

I would suspect beginners stay away from Set Analysis until they are past the beginners stage, the syntax is far too  obtuse!

Just my opinion though

Richard

247 Views
giakoum
Honored Contributor II

@Richard :

Yes Set Analysis is tricky, I am having a hard time with it myself. But you have to do it the right way, tricky or not. I have seen many developers, not necessarily beginners, use anything else but set analysis, and I really don't like the results. Very badly coded apps performing terribly... You cannot act as a professional QV developer without it I think.

0 Likes
247 Views
michaelk
Contributor

Henric, very interesting post. I didn't realise, that there could be so many problems. Seems to be, that I where lucky until now, to always have an expression without an aggregation that works fine. I will have an eye on it.

0 Likes
247 Views
Not applicable

Ioanis,

I agree, its tricky, and who says its the right way to do this?  I think it is one way, not always the right way.  I always prefer my developers to use code thats easy to understand, and efficient.  I have seen bad code done both ways. (I have myself written bad code, both ways )

I also think the example Henric Cronström was using was to demonstrate the Naked Field Reference

0 Likes
247 Views
giakoum
Honored Contributor II

I need to insist on that Richard. Set analysis is definitely the right way. Check out all best practices documents from QlikTech, it is not only me saying that....

0 Likes
247 Views
martin_dideriks
New Contributor III

If not both fields (OrderDate and Amount) are to be found in the same table and you use the expression sum(if(...)), your chances to get a wrong result is pretty big caused by multiple values.

loannis Giakournakis is right..Learn the Set Analysis as this will give you the correct result.

I am not sure, though, that the point with Henrics post was to teach everyone best practices.

As I see it, it is how QV in general works.

247 Views
anantmaxx
Contributor III

HIC Thnx !!

understood what you want to say or suggest

If( Min(OrderDate) >= vReferenceDate, Sum(Amount) )

Or, alternatively, the If() function should be put inside the Sum() function:

      Sum( If(OrderDate >= vReferenceDate, Amount) )

not understood the this line :

"In the first of the two expressions, the If() function will be evaluated once per dimensional value; in the second once per row in the raw data"      

If() function will be evaluated once per dimensional value ?? What does that meann

Anant

0 Likes
247 Views
Employee
Employee

If you put the If() outside the aggregation function (the first example), then the If() will be evaluated once per dimensional value, i.e. once per row in the pivot table or once per slice of the pie chart.

But if you instead put the If() inside the aggregation function (the second example), then the If() will be evaluated once per row in the raw data, which could mean a considerable performance problem.

HIC

247 Views
dvqlikview
Honored Contributor II

Another good foundation & must know lesson for all the Qlikers. Many thanks HIC.

0 Likes
247 Views
anantmaxx
Contributor III

Make sense , thanks for your Prompt Answer !!

0 Likes
247 Views
whiteline
Honored Contributor II

Hi Henric.

I doubt that this:

Sum(If(OrderDate >= vReferenceDate, Amount))

implicitly implies this:

Sum(aggr(If(OrderDate >= vReferenceDate, Amount), OrderDate, Amount))

So that it can lead to the unexpected results in case of orders for the same date and amount.

I guess it's the second most popular mistake after the lack of aggregation functions.

In such a case (if under the aggregation) I usually suggest (upd: not anymore) to use explicit aggr() function to be sure the granularity that is used for the condition calculation.

In this case it could be the field that truly represents the row:

Sum(aggr(If(OrderDate >= vReferenceDate, Amount), OrderID))

0 Likes
247 Views
Employee
Employee

No, the two expressions are not the same. The Sum(...) function is a sum over all records in the source data, whereas the Sum(Aggr(..., Dim)) is a sum over the records created by the Aggr(), i.e. the number of records is determined by the number of distinct values of Dim.

Further, you need to have an aggregation function as first parameter in the Aggr() function. It needs to be a single value. The Aggr() function is just like a for-next loop. The following are correct constructions:

OrderDate             An array of values

Max(OrderDate)     A single value

Aggr(Count(OrderDate), … )       An array of values

Max(Aggr(Count(OrderDate), … ))           A single value

HIC

247 Views
whiteline
Honored Contributor II

Yeah, you're right, the loop is performed for each record without 'distinct' whether the fields are in the same table or not.

Probably, the join that is performed in case the fields are from different tables, confused me sometime ago in an incorrectly designed datamodel.

Furthermore, the expression with simple if() has really much better performance, especially if the fields are not in the same table.

Thanks for clarification.

0 Likes
247 Views
robert99
Valued Contributor II

Agree. Set analysis is a bit of a nightmare when it comes to this sort of formula

Sum ({$<OrderDate = {'>=$(vReferenceDate)}'} Amount))

Much easier is

OrderDate >= vReferenceDate

And I think OrderDate > SalesDate does not work anyway in set analysis (at least i can not get it to work)

247 Views
whiteline
Honored Contributor II

Hi, RJ.

If you mean the condition OrderDate > SalesDate to select the orders, you can do it with set analysis like this:

Sum({<OrderId={'=OrderDate>SalesDate'}>} Amount)

247 Views
Employee
Employee

@ whiteline

Your search expression will work in most cases, but only because there is only one OrderDate and one SalesDate per OrderID. Both OrderDate and SalesDate are naked field references, so the search will not work if there are several SalesDate:s per OrderID.

HIC

247 Views
whiteline
Honored Contributor II

Henric Cronström

I understand that. There is some lack of context about the data model.

Anyway, returning to the theme topic, in most cases we have to use aggregation within set analysis too )

0 Likes
247 Views
Not applicable

Thanks for this share....

0 Likes
247 Views
robert99
Valued Contributor II

Sum({<OrderId={'=OrderDate>SalesDate'}>} Amount)

Thanks for this. It works in my case. I actually was using it for a service business. To measure the number of calls completed within the required time

So Count  ({<Call_Num={'=Call_ResponseDate >= Call_StartDate '}>} distinct Call_Num)

rather than count (if ( Call_ResponseDate >= Call_StartDate , distinct Call_Num))

0 Likes
247 Views
shawn-qv
Contributor

A useful blog highlighting on the difference between if(sum) and sum(if) which I think a lot of people don't think much about.

Hopefully this is now more clear, as it's quite fundamental to understanding how QV reads and performs aggregation.

0 Likes
247 Views
barryharmsen
Contributor II

Henric Cronström Would it also be correct to say that placing single dimensions (with or without the Only() aggregation function) in the expressions tab instead of the dimensions tab would cause them to be cached differently? (or, cause them to be included in the cache) I recently did a health check on a QlikView application where developers had put all but one of the dimensions into the expressions tab and the chart was using lots of RAM and had a really poor response time. Just moving all the dimensions to the proper tab caused the chart to calculate 5 times faster and also reduced RAM consumption.

0 Likes
247 Views
Employee
Employee

Yes, they will be cached differently. Where to put a field - as dimension or expression - is a question that has different answers in different cases. Adding a dimension is often not a good idea since it may increase the size of the cube. Then it is better to have the field as dimension expression instead. In your case it was obviously the other way around, probably because the added dimensions didn't increase the size of the cube significantly.

HIC

0 Likes
247 Views