Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
ALERT: QlikView server communication interruptions following Microsoft Windows Domain Controller security updates
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Standard deviation and weighted average

Hi,

I am analysing aggregated survey results using Qlikview. The results  of the survey are on a scale 1-5, so for example I have:

Score 1 - 12 respondents

Score 2 - 23 respondents

Score 3 - 45 respondents

Score 4 - 120 respondents

Score 5 - 35 respondents

Each rating score is a separate dimension in the data source and is aggregated data.

I am looking to calculate two statistics:

1) the weighted average score

2) the standard deviation of the mean

I know that to calculate the weighted average in excel it would be (1 x 12) + (2 x 23) + (3 x 45) +(4 x 120) + (5 x 35) /   235  (number of respondents) = 3.6

From this I need to calculate the standard deviation of the mean.

But I am having trouble expressing this in qlikview.

Any help would be appreciated.

Labels (1)
1 Reply
johnw
Champion III
Champion III

The simplest approach might be to unaggregate the scores again.  For instance:

Raw:
LOAD Score as Score2
RESIDENT Aggregated
WHILE iterno()<=Respondents
;

Then you can just use avg(Score2) and stdev(Score2).  Even if you have a huge sample size, it's only a single small integer getting duplicated (Edit: well, plus a key field to connect it to your data), so the memory requirements might not be bad.  If that is impractical for you, here's a way to calculated the weighted average and the standard deviation without unaggregating:

weighted average   = sum(Score*Respondents)/sum(Respondents)
standard deviation = sqrt(sum(Respondents*sqr(Score-sum(total Score*Respondents)/sum(total Respondents)))/(sum(Respondents)-1))

You may or may not want the final -1 in the standard deviation.  It's required to duplicate what QlikView does, and Wikipedia indicates that this is used when you have a sample from a population.  If you have ALL responses, then it may be appropriate to remove it.


See attached.