Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Average of a field on weekdays

Hi,

I have a calendar module which gives me month, weekdays, etc etc..

I have another table which connects to the calendar module by date.

Now in the table, I have a field called [enrollment count], which has some data filled for every day.

I have a chart which gives me a sum of the [enrollment count] day wise: i.e. Monday, Tuesday, Wednesday, etc.. I used the expression sum([expression count])

Now instead of sum, I want an average of the [enrollment count] day wise. So if there are five sundays in a month it should take the sum of all the five values and then divide it by 5. I used the expression avg([expression count]), but the result is not correct.

Please suggest where I am going wrong.

25 Replies
Not applicable
Author

Ich werde ab 05.09.2011 nicht im Büro sein. Ich kehre zurück am

23.09.2011.

Ich werde Ihre Nachrichten nach meiner Rückkehr beantworten.

Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Verwertung des Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt.

Bitte benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen Dank!

swuehl
MVP
MVP

Hi Rajiv,

IMHO, the expressions  1) and 2),  (i.e. avg([Field]) and sum([Field]) / count([Field]) )  are identical (so 2) is just how the function avg is calculated with more granular functions).

3) is also like an average, but it uses a different count number to divide by.

While 1) and 2) iterate over the records restricted by the charts dimensions and do a sum and count of all these records (which would give you an average too low in your case, give that you could have multiple records per weekday in any given week), 3) will divide only by the number of distinct Dates.

For example, if you have multiple records for today 2011-09-05, like

recNo, Value, Date

1, 10, 2011-09-05

2, 10, 2011-09-05

3, 10, 2011-09-05

1) and 2) will calculate:

sum(Value) / count(Value) = 30 / 3 = 10

while 3) calculates:

sum(Value) / count (distinct Date) = 30 /1 = 30

It's always helpful for learning to put the expressions in parts and add the parts as separate expressions to your table, so you will learn how QV does the calculations taking the dimensions / selections into account.

Hope this helps,

Stefan

Not applicable
Author

Ich werde ab 05.09.2011 nicht im Büro sein. Ich kehre zurück am

23.09.2011.

Ich werde Ihre Nachrichten nach meiner Rückkehr beantworten.

Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Verwertung des Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt.

Bitte benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen Dank!

Not applicable
Author

Thanks for the post Stefan,

I still have a doubt..

My data has only one record per weekday. Meaning for a particular date I have only one record.

Yet when I used expressions 1 and 2, it gave me a very low value. Expression 3 ofcourse gave me the correct value.

Your example makes sense, when there are multiple records for a particular date. But when there is a single record per date, and yet you get a lower value, it means that qlikview is doing something else which I am not comprehending..

Your inputs...

Cheers,

Rajiv

Not applicable
Author

Ich werde ab 05.09.2011 nicht im Büro sein. Ich kehre zurück am

23.09.2011.

Ich werde Ihre Nachrichten nach meiner Rückkehr beantworten.

Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Verwertung des Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt.

Bitte benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen Dank!

swuehl
MVP
MVP

Hi Rajiv,

count([Field]) must be different from count(distinct Date) then. Have you looked at the numbers of

count([Field])

count(Date)

count(distinct Date)

(put as expression in your table)?

You could add Date as dimension to your table after your Weekday.

I would recommend selecting the date and trying to look at the record level, which records are selected (by looking at a unique ID, maybe?). And keep in mind, that e.g. multiple identical records are not shown multiple e.g. in table box.

Regards,

Stefan

Not applicable
Author

Thanks Stefan,

Really helpful... There indeed was multiple records for a particular date, which I had overlooked!! And hence your earlier example stands as the explanation for the lower value that I was getting.

Cheers,

Rajiv

Not applicable
Author

Next requirement:

Now that I have the average of the enrollment count day wise, I want to remove the outliers so that the average value is not skewed.

For example in the 5 sundays of a month: the enrollment count could be 25,24,23,22,01.

Now I don't want my average to consider 01.

The average that I am getting now(with current expression) is 19((25+24+23+22+01)/5)

I should ideally get 23.5((25+24+23+22)/4)

Any suggestions on how to go about this.

Regards,

Rajiv Maskara

swuehl
MVP
MVP

IMHO, removing outliers is kind of dangerous, especially with low number of data points (like sundays in a month, max 5).

But anyway, what do you consider an outlier? Is there any rule (e.g. distance from mean)?

Or do you just wanna skip the min and max values?

Not applicable
Author

I have the enrollment count data, which is a daily event. But then on national holidays, there are no or very little enrollments.

So, for example, if a state has a daily average enrollment count of around 50000, on national holidays this count would fall to say around 10000...

The numbers can vary from state to state.. So for a different state, the daily enrollment count could be only around 7000 and on national holidays, this could fall to say 1000...

This is the situation that I have with me... I understand that removing outliers with so few data points could be a little risky, but it tends to skew the average quite a lot, precisely because there are so few data points.

Any suggestions now??