Solved: Re: Average of a field on weekdays - Page 2 - Qlik Community

Report Inappropriate Content · ‎2011-09-05

Hi,

I have a calendar module which gives me month, weekdays, etc etc..

I have another table which connects to the calendar module by date.

Now in the table, I have a field called [enrollment count], which has some data filled for every day.

I have a chart which gives me a sum of the [enrollment count] day wise: i.e. Monday, Tuesday, Wednesday, etc.. I used the expression sum([expression count])

Now instead of sum, I want an average of the [enrollment count] day wise. So if there are five sundays in a month it should take the sum of all the five values and then divide it by 5. I used the expression avg([expression count]), but the result is not correct.

Please suggest where I am going wrong.

Report Inappropriate Content · ‎2011-09-05

Ich werde ab 05.09.2011 nicht im Büro sein. Ich kehre zurück am

23.09.2011.

Ich werde Ihre Nachrichten nach meiner Rückkehr beantworten.

Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Verwertung des Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt.

Bitte benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen Dank!

swuehl · ‎2011-09-05

Hi Rajiv,

IMHO, the expressions 1) and 2), (i.e. avg([Field]) and sum([Field]) / count([Field]) ) are identical (so 2) is just how the function avg is calculated with more granular functions).

3) is also like an average, but it uses a different count number to divide by.

While 1) and 2) iterate over the records restricted by the charts dimensions and do a sum and count of all these records (which would give you an average too low in your case, give that you could have multiple records per weekday in any given week), 3) will divide only by the number of distinct Dates.

For example, if you have multiple records for today 2011-09-05, like

recNo, Value, Date

1, 10, 2011-09-05

2, 10, 2011-09-05

3, 10, 2011-09-05

1) and 2) will calculate:

sum(Value) / count(Value) = 30 / 3 = 10

while 3) calculates:

sum(Value) / count (distinct Date) = 30 /1 = 30

It's always helpful for learning to put the expressions in parts and add the parts as separate expressions to your table, so you will learn how QV does the calculations taking the dimensions / selections into account.

Hope this helps,

Stefan

Report Inappropriate Content · ‎2011-09-05

Ich werde ab 05.09.2011 nicht im Büro sein. Ich kehre zurück am

23.09.2011.

Ich werde Ihre Nachrichten nach meiner Rückkehr beantworten.

Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Verwertung des Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt.

Bitte benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen Dank!

Report Inappropriate Content · ‎2011-09-06

Thanks for the post Stefan,

I still have a doubt..

My data has only one record per weekday. Meaning for a particular date I have only one record.

Yet when I used expressions 1 and 2, it gave me a very low value. Expression 3 ofcourse gave me the correct value.

Your example makes sense, when there are multiple records for a particular date. But when there is a single record per date, and yet you get a lower value, it means that qlikview is doing something else which I am not comprehending..

Your inputs...

Cheers,

Rajiv

Report Inappropriate Content · ‎2011-09-06

Ich werde ab 05.09.2011 nicht im Büro sein. Ich kehre zurück am

23.09.2011.

Ich werde Ihre Nachrichten nach meiner Rückkehr beantworten.

Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Verwertung des Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt.

Bitte benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen Dank!

swuehl · ‎2011-09-06

Hi Rajiv,

count([Field]) must be different from count(distinct Date) then. Have you looked at the numbers of

count([Field])

count(Date)

count(distinct Date)

(put as expression in your table)?

You could add Date as dimension to your table after your Weekday.

I would recommend selecting the date and trying to look at the record level, which records are selected (by looking at a unique ID, maybe?). And keep in mind, that e.g. multiple identical records are not shown multiple e.g. in table box.

Regards,

Stefan

Report Inappropriate Content · ‎2011-09-06

Thanks Stefan,

Really helpful... There indeed was multiple records for a particular date, which I had overlooked!! And hence your earlier example stands as the explanation for the lower value that I was getting.

Cheers,

Rajiv

Report Inappropriate Content · ‎2011-09-06

Next requirement:

Now that I have the average of the enrollment count day wise, I want to remove the outliers so that the average value is not skewed.

For example in the 5 sundays of a month: the enrollment count could be 25,24,23,22,01.

Now I don't want my average to consider 01.

The average that I am getting now(with current expression) is 19((25+24+23+22+01)/5)

I should ideally get 23.5((25+24+23+22)/4)

Any suggestions on how to go about this.

Regards,

Rajiv Maskara

swuehl · ‎2011-09-06

IMHO, removing outliers is kind of dangerous, especially with low number of data points (like sundays in a month, max 5).

But anyway, what do you consider an outlier? Is there any rule (e.g. distance from mean)?

Or do you just wanna skip the min and max values?

Report Inappropriate Content · ‎2011-09-06

I have the enrollment count data, which is a daily event. But then on national holidays, there are no or very little enrollments.

So, for example, if a state has a daily average enrollment count of around 50000, on national holidays this count would fall to say around 10000...

The numbers can vary from state to state.. So for a different state, the daily enrollment count could be only around 7000 and on national holidays, this could fall to say 1000...

This is the situation that I have with me... I understand that removing outliers with so few data points could be a little risky, but it tends to skew the average quite a lot, precisely because there are so few data points.

Any suggestions now??