1 Reply Latest reply: Mar 8, 2012 12:02 PM by John Witherspoon RSS

    Standard deviation and weighted average

      Hi,

       

      I am analysing aggregated survey results using Qlikview. The results  of the survey are on a scale 1-5, so for example I have:

       

      Score 1 - 12 respondents

      Score 2 - 23 respondents

      Score 3 - 45 respondents

      Score 4 - 120 respondents

      Score 5 - 35 respondents

       

      Each rating score is a separate dimension in the data source and is aggregated data.

       

      I am looking to calculate two statistics:

      1) the weighted average score

      2) the standard deviation of the mean

       

      I know that to calculate the weighted average in excel it would be (1 x 12) + (2 x 23) + (3 x 45) +(4 x 120) + (5 x 35) /   235  (number of respondents) = 3.6

       

      From this I need to calculate the standard deviation of the mean.

       

      But I am having trouble expressing this in qlikview.

       

      Any help would be appreciated.

        • Re: Standard deviation and weighted average
          John Witherspoon

          The simplest approach might be to unaggregate the scores again.  For instance:

           

          Raw:
          LOAD Score as Score2
          RESIDENT Aggregated
          WHILE iterno()<=Respondents
          ;

           

          Then you can just use avg(Score2) and stdev(Score2).  Even if you have a huge sample size, it's only a single small integer getting duplicated (Edit: well, plus a key field to connect it to your data), so the memory requirements might not be bad.  If that is impractical for you, here's a way to calculated the weighted average and the standard deviation without unaggregating:

           

          weighted average   = sum(Score*Respondents)/sum(Respondents)
          standard deviation = sqrt(sum(Respondents*sqr(Score-sum(total Score*Respondents)/sum(total Respondents)))/(sum(Respondents)-1))

           

          You may or may not want the final -1 in the standard deviation.  It's required to duplicate what QlikView does, and Wikipedia indicates that this is used when you have a sample from a population.  If you have ALL responses, then it may be appropriate to remove it.


          See attached.