Qlik Community

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Average – Which average?

If you want to display an average number of something in QlikView, you should use the Avg() function, right?

 

Wrong.

 

Yes, there is an Avg() function that returns the average value, but – this is usually not the value that you are looking for. The Avg() function returns the average transactional value, whereas you probably are looking for a larger amount.

 

For example, let’s say that you have an orders database where the grain - the most atomic level – of the data is Order Line. Each order can have several order lines and each order line has an amount. Then, the Avg() function will return the average order line amount, which is utterly uninteresting. No, you are most likely more interested in questions like “How much do we sell per month, on the average?”

 

In other words – a calculation of an average has an implicit internal grouping entity; the average per month, per day, per order, per delivery or something else. You can look at it as a two-step aggregation:

 

  1. Sum all the amounts – per each value of the internal grouping entity (e.g. month, day, order or delivery)
  2. Calculate the average of the sums from previous bullet.

 

In QlikView, you would calculate the average monthly sales value in one of the two following ways:

 

     Sum( Amount ) / Count( distinct MonthID )
     Avg( Aggr( Sum(Amount), MonthID ) )

 

… and similarly for orders, days or deliveries. Use of the Aggr() function will work, but it is not as fast as the first option, and should therefore be avoided.

 

Sometimes there are several internal grouping entities. You may for instance want to show the average monthly sales value per customer, i.e. you want to use both month and customer as grouping entity. Then you should use one of the following expressions:

 

     Sum( Amount ) / Count( distinct MonthID & '|' & CustomerID )
     Avg( Aggr( Sum(Amount), MonthID, CustomerID ) )

 

The Count() aggregation with a string concatenation will find every unique combination of month and customer.

 

Note that the internal grouping entity has nothing to do with the dimension you choose to use when you display it. It would make perfect sense to show the above number using Product as the only dimension, as shown in the graph below. Hence, the internal grouping entity is not necessarily visible in the end result.

 

Average graph.png

 

So, you need to figure out which internal grouping entity you want and then use this in a smart way in your expression.

 

And by the way – an internal grouping entity can be used also for other aggregation functions: Smallest, largest, most common: Min(), Max(), Mode(). But for these, you will need to use the Aggr() function.

 

HIC

26 Comments
henster36
New Contributor III

Hi Henric

It's very interesting how you use the combination of two dimensions in the Count function to avoid using the Aggr and I'll definitely remember it. I usually find that I want to calculate the average only for dimensions which have values as the others don't show up on the chart if you've got Suppres Zero Values ticked and then have to use set analysis to calculate it. But that can also easily be incorporated in your Count function.

Thanks for the post!

2,304 Views
cesaraccardi
Valued Contributor

Hi Henric,

What's the difference between the 2 expressions when it comes to performance? I've used aggr many times in my projects and I am not sure about the performance impact when you are working with huge amounts of data... Great explanation by the way.

Cheers

2,304 Views

My experience is that Aggr() always is slower. The distinct in the Count() also has a penalty, but it is usually not as bad.

HIC

2,304 Views
borisman
Contributor III

very good post, thanks HIC!

talking about performance and your recent post on Distinct count testing, do you mind to write a post on how to do performance testing in QV?

I asked a question How to do performance testing in QV but no one replied yet.

0 Likes
2,304 Views
Not applicable

Very Helpful again Henric. However for zero value months, these are being omitted from my calculation. How would I include these?

0 Likes
2,304 Views

If a month has a transaction with zero value, it will be included. But the more likely scenario is that the month does not have any transaction, and then it will not be included.

What you can do to force the month to be included, is to use Set Analysis. The following could serve as example:

   Sum( Amount ) / Count( {1} distinct MonthID )


However, this expression will include all months, whereas you may want QlikView to include some but not all of them. And this can be tricky: If a month is excluded by a selection, it's Excluded. Full stop. There is no way for QlikView to know whether this was intended or not.


A work-around could be to include a zero-value transaction for each combination of month and product, already when loading data. Then we are back to the first sentence of this answer.


HIC

2,304 Views
anantmaxx
Valued Contributor

Thanks !! , nice and beautifully explained!!

0 Likes
2,304 Views
marksmunich
Contributor III

Hello Henric,

Can i do it so,

Avg(AGGR(Max(enddate)-Min(Smartdate),Fieldname) as AvgValuename in Scripting

or simple = Avg(AGGR(Max(enddate)-Min(Smartdate),Fieldname) in text object.

Is it correct.


0 Likes
2,304 Views
Not applicable

Hello all,

can you see following this link my question regarding the AVG function :

Average 30 last days for each date selected

Thanks in advance for your help

regards

0 Likes
2,304 Views
Not applicable

Great post! I read through it but apparently still need help. The below expression works as expected within a straight table. I have set the Total mode within the expression to be "Avg of Rows" and on the Number tab for the value to show as Percent.

I now want to also use this same expression in a Text Object. When I drop this expression into a Text Object I am not getting the correct result (getting a higher percentage). I have tried different permutations of using AVG, AGGR and others....so I am clearly missing something. What do I need to adjust in the Text Object to make this render the same result as within in a straight table??

Straight Table --> =$(vBillableCount) / ($(vBillableCount) + sum([NonBillable]))

Text Object --> =num($(vBillableCount) / ($(vBillableCount) + sum([NonBillable])), '##.00%')

0 Likes
2,304 Views

Since I don't know what you have in vBillableCount, I cannot tell for sure. But I strongly suspect that the problem is in the Num() function. I suggest you try

   Num( ... , '0%')

and see if this works.

HIC

0 Likes
2,304 Views
Not applicable

Thanks Henric. If you don't mind (or anyone else viewing this post), i'd love your feedback of what I am doing wrong. The below straight table results is a small snippet of results -- all expressions working as expected. B2NB Ratio formula: =$(vBillableCount) / ($(vBillableCount) + sum([NonBillable]))

The above expression breaks down to vBillableCount = FTE Billable Hrs / 40 which yields FTE Billable Count. So for the ratio, I am simply taking FTE Billable Count / (FTE Billable Count + NonBillableCount) or

40.99 / (40.99 + 13) = 75.94. I then set the Total Mode of this expression to be Average of Rows.

But when  I drop this same expression into a text box, I am not getting the same result (in this specific data set, I get 85.41%). I tried appending TOTAL before the [NonBillable], but errored my expression. I am sure it is something minor I need to correct. Thanks Qlik braintrust!

       

Week EndFTE Billable HrsFTE Billable CountNonBillable CountB2NB Ratio
1,639.6040.991375.94%
11/20/15826.2520.66677.49%
11/13/15813.3520.33774.39%
0 Likes
2,304 Views

The "Average of Rows" is a setting that only exists for Straight tables. It does not apply to text boxes. So I wouldn't use it...

About the formula... I assume that you have used aggregation functions like Sum() to define your columns, e.g.

* [FTE Billable Count] = Sum(Billable) / 40

* [NonBillable Count] = Sum(NonBillable) / 40

Then you can define your ratio as

Ratio = Sum(Billable) / (Sum(Billable) + Sum(NonBillable))

and if you want the total to show average over weeks, you should just change this to

Ratio = Sum(Billable) / (Sum(Billable) + Sum(NonBillable)) / Count(distinct [Week End])

This last formula you can use in a text box also.

HIC

0 Likes
2,304 Views
datanibbler
Esteemed Contributor

Hi Henric,

I think the thing with the average is quite logical, but nevertheless this is a very good explanation.

Maybe you can help me with a related problem I have - the underlying reason is probably easy, but I just cannot think of it:

- I have an Excel_database of shelves. Every shelf has a nr. of storage_lots (to keep things simple, I

   assume that every box is the same size, so I can put one box into one lot)

- From the company_database I get a nr. of boxes that are booked on any of the lots assigned to one

   shelf

=> From that results a usage_rate of the shelf_space. OK?

- Now, in that app I have two objects: A straight_table with the detail data (every single shelf)

   and a gauge just showing the overall usage_rate.

- One column in the table displays the usage_rate of every single shelf.

- For the top row I selected to display an average

=> For some time I had the problem that the overall usage_rate displayed by the gauge always differed

      from the figure in the top row of the table whereas every single shelf was displayed correctly and

      matched (between the two objects)

- To have a closer look at this issue, I finally reduced the nr. of shelves I use down to just a handful - still

   the figures are different:

    - When I take the overall count_of_boxes and divide it by the overall_nr_of_lots, I get one figure;

    - When I take the usage_rate of every single shelf and draw the average of those, I get another.

=> I am pretty sure that the latter method is just not possible, so I just take that figure out of the table and

     I have no problem anymore <-> but I would still like to know why.

Can you say a word to that?

Thanks a lot!

Best regards,

DataNibbler

0 Likes
2,304 Views

The usage rate is a non-additive number, so you can not just average this. If all shelves had identical sizes, you could in theory average them. But as soon as there is one shelf with a different size, or you make a selection in some other property (thereby changing the "effective" size), the numbers need to be weighted. I.e. you can no longer just average them. 

No, just as you say, you need to (on all levels) take the overall count_of_boxes and divide it by the overall_nr_of_lots. See more on Additive and Non-Additive Numbers

HIC

2,304 Views
datanibbler
Esteemed Contributor

Thanks a lot! I will have a look at that. I still don't understand exactly why these figures cannot be averaged without weighting, but having confirmed from a pro like you that they cannot is something - it really takes a load off my chest. I was beginning to doubt my own capability because I just couldn't figure out why I had these differences ...

Best regards,

DataNibbler

0 Likes
2,304 Views

An example: You have the following source data:

Assigned.png

Then you can calculate the usage:

Usage.png

However, this usage - the percentage - is a non-additive number. If you calculate the average of 50% and 100% you may (incorrectly) think that it is 75%. But look at source data, and you will easily see that it is not. Hence, you cannot use it for further calculations. You need to go back to the source to calculate a global average from the count of the [Assigned?] field..

HIC

0 Likes
2,304 Views
Not applicable

Hello Henric,

I’m new to Qlikview and SQL programming, I’m using the Qlikview personal edition and until now I have been able to obtain most of the data points I need, but I’m struggling while trying to calculate the average
between two different set of date fields (startdate & assigndate), and the formula seems to work but only when I select 1 record, but if I keep it open (all records – “thousands”) them the formula does not display anything.

The formula I’m using today is:

=if(Match(Team, 'A'), avg(Num( AssignedDate - StartDate, '#.#')),Null())

And I would like to be able to obtain the average number of days between assignment and stardate, and will them apply filters as needed for years, quarters, months, etc… for this calculation I'm using a textbox as KPI measurement.

Thanks for all the help you can provide me !!!!

Carlos

0 Likes
2,304 Views

The problem here is that "Team" is a naked field reference - it is not wrapped in an aggregation function - it will return NULL if there are several values. See more on Use Aggregation Functions!.

The following should do the trick:

Avg({$<Team={'A'}>} AssignedDate - StartDate)

HIC

2,304 Views
Not applicable

Wow, THANKS Henric, the result is exactly what I needed - nice and simple !!! Thank you for your prompt support, Carlos

0 Likes
2,304 Views
tdiamond
New Contributor II

Hello,

I'm trying to understand how the Aggr() function can be used to produce the same result across different straight tables when the same dimensions are not used.

For example, I want to know the average dollar amount of the sales orders entered by salesperson by month (the red dollar amount in table2 needs to match table1)

Straight table1 - shows the monthly averages for the selected salesperson and provides a total average over the rows

  • Dimension = Month
  • Expression = Avg(Aggr(Sum(Sales),Month,OrderNumber,Salesperson))

Straight table2 - provides a summary of the total averages by salesperson but the average does not match the straight table1's total average of the rows

  • Dimension = Salesperson
  • Expression = same calculation as straight table1 but what ends up happening is that the total sales for the entire period are divided by the total number of distinct orders.

table1.png

table2.png

Looking forward to your insight,

Teresa

0 Likes
2,304 Views

I cannot understand why you have different numbers without looking into the app. It should work, as I see it. What happens if you instead use

   Sum(Sales) / Count( distinct OrderNumber )

which should result in the same number?

HIC

0 Likes
2,304 Views
tdiamond
New Contributor II

Thank you for the reply.  The equation you suggested worked.

0 Likes
2,304 Views
PuriVelasco
New Contributor III

@Henric_Cronström I am trying to use the Median for a moment but I am obtaining wrong values.

I use:

Time(Median([begin_date])) where begin_date is a field in date format.

What do I have to use to obtain a correct median in the same way you use the average?

Thank you very much.

 

0 Likes
1,068 Views

@PuriVelasco 

Why this doesn't work, I cannot say. It should work. I would do the following

0 Likes
1,052 Views
PuriVelasco
New Contributor III

@Henric_Cronström Thank you very much for your answer.

I have to use Time() because I need hours, minutes and seconds from date.

 

0 Likes
1,048 Views