Skip to main content
Announcements
See why Qlik is a Leader in the 2024 Gartner® Magic Quadrant™ for Analytics & BI Platforms. Download Now
cancel
Showing results for 
Search instead for 
Did you mean: 
evan_kurowski
Specialist
Specialist

Do we have a Variance function? Can we specifiy statistical functions as Population v. Sample?

Hello Qlik & Qlik-ers ,

Recently I've begun translating functionality of Data Science educational curriculums into QlikView. 

Specifically to learn the data science concepts, and additionally to explore the coding possibilities when migrating syntaxes across Qlik, Python, & R. (could Qlik data-science libraries be fashioned and prepackaged, the same way Python does this?)

So far I haven't reached a point where the data science becomes prohibitive in Qlik, there's been no function or calculation yet where I discover 'Qlik does not have that gear' (but I've only explored fundamentals so far).


However, a few things seem evident.  Though we have the STERR & STDEV functions, a lot of the data science exercises want to work with the precursor calculation the VARIANCE.  VARIANCE is a component used to form the STERR & STDEV , but I haven't seen a convenient VARIANCE() function that calculates this directly.  

Do we have a VARIANCE() function, and I just haven't looked in the right place?


Also, it appears statistical functions have more specificity with the advent of Data Science.  A solitary STERR or STDEV function is no longer sufficient, because Data Science is obsessed with the idea calculations modify based on whether you are evaluating the entire population, or just a sample. 

Therefore, if you're just evaluating a sample, the denominator of ratios get penalized -1 from the count of total items.

(conceptually this seems strange to me, as if we calculate Avg of the entire data set without selections (population?), we wouldn't reduce the denominator of that Avg merely because we applied selections (sample?) and examined only a portion of the data set.  For aggregation purposes up till now, regional evaluation is considered just as valid as a global evaluation.)

In Excel, this resulted in the evolution of the statistical functions into two splits, a .P series & an .S series:
STDEV split into STDEV.P (for population), and STDEV.S (for sample), STERR.P  , STERR.S, etc..


So do we have a way to parameterize Qlik STDEV & STERR functions so they differentiate between Sample & Population?  i.e.  STDEV(Salary, P)  vs. STDEV(Salary, S)?    



Appreciate your time Qlikish Data Scientists, thanks very much!  ~E








 






Labels (5)
2 Replies
evan_kurowski
Specialist
Specialist
Author

Perhaps adding an illustration will make it clearer.

If I search on 'variance calculator' it will bring up this handy little engine.  You can see then by switching between 'Population' and 'Sample' it adjusts the expressions accordingly.  I also included a screenshot of Excel functions which seem to have deprecated 'Stdev'

variance & stdev formulas for 'population' v. 'sample'variance & stdev formulas for 'population' v. 'sample'

Brett_Bleess
Former Employee
Former Employee

Evan, have a look over the following links:

https://help.qlik.com/en-US/qlikview/April2019/Subsystems/Client/Content/QV_QlikView/Scripting/Stati...

https://help.qlik.com/en-US/qlikview/April2019/Subsystems/Client/Content/QV_QlikView/Scripting/Stati...

https://help.qlik.com/en-US/qlikview/April2019/Subsystems/Client/Content/QV_QlikView/Scripting/Stati...

https://community.qlik.com/t5/Qlik-Design-Blog/Recipe-for-a-Histogram/ba-p/1462688

Best I have, not sure if this is what you were trying to find or not, hopefully it was.

Regards,
Brett

To help users find verified answers, please do not forget to use the "Accept as Solution" button on any post(s) that helped you resolve your problem or question.
I now work a compressed schedule, Tuesday, Wednesday and Thursday, so those will be the days I will reply to any follow-up posts.