Discussion Board for collaboration related to QlikView App Development.
Good morning,
I am currenlty trying to get a firm grasp on QlikView's capabilities to formulate statistical expressions, and have found out a few interesing functions under the Statistical Aggregation Functions in Charts and Statistical Distribution Functions
sections in the inline help. Up to now, I am basically quite confident that the following needs are answered (my question is at the end):
Kurtosis
kurtosis([{set_expression}][ distinct ] [ total [<fld { , fld } >] ] expression)
Returns the aggregated kurtosis of expression or field iterated over the chart dimension(s).
This function has the same limitations for nested aggregation as the avg([{set_expression}] [ distinct ] [ total [<fld { , fld } >]] expression) function. The kurtosis function supports Set Analysis and the total qualifier in the same way as the avg([{set_expression}] [ distinct ] [ total [<fld { , fld } >]] expression) function.
Examples:
kurtosis(Sales)
kurtosis(X'Y/3)
kurtosis(distinct Price)
kurtosis(total Sales)
kurtosis({1} total Sales)
Median
median ([{set_expression}] [ distinct ] [ total [<fld {,fld}>] ] expression )
Returns the aggregated median of expression iterated over the chart dimension(s).
This function has the same limitations for nested aggregation as the avg([{set_expression}] [ distinct ] [ total [<fld { , fld } >]] expression) function. The median function supports Set Analysis and the total qualifier in the same way as the avg([{set_expression}] [ distinct ] [ total [<fld { , fld } >]] expression) function.
Examples:
median( X )
median( X*Y/3 )
median( total X )
median( total <Group> Price )
Standard Deviation
stdev([{set_expression}][ distinct ] [ total [<fld { , fld } >] ] expression)
Returns the aggregated standard deviation of expression or field iterated over the chart dimension(s).
This function has the same limitations for nested aggregation as the avg([{set_expression}] [ distinct ] [ total [<fld { , fld } >]] expression) function. The stdev function supports Set Analysis and the total qualifier in the same way as the avg([{set_expression}] [ distinct ] [ total [<fld { , fld } >]] expression) function.
Examples:
stdev(Sales)
stdev(X'Y/3)
stdev(distinct Price)
stdev(total Sales)
stdev({1} total Sales)
Mean
avg([{set_expression}] [ distinct ] [ total [<fld { , fld } >]] expression)
Returns the aggregated average of expression or field iterated over the chart dimension(s). [...]
If the word distinct occurs before the function arguments, duplicates resulting from the evaluation of the function arguments will be disregarded.
If the word total occurs before the function arguments the calculation will be made over all possible values given the current selections but disregarding the chart dimension variables.
The total qualifier may be followed by a list of one or more field names within angle brackets. These field names should be a subset of the chart dimension variables. In this case the calculation will be made disregarding all chart dimension variables except those listed, i.e. one value will be returned for each combination of field values in the listed dimension fields. Also fields which are not currently a dimension in a chart may be included in the list. This may be useful in the case of group dimensions, where the dimension fields are not fixed. Listing all of the variables in the group causes the function to work when the cycle or drill-down level changes.
Examples:
avg(Sales)
avg(X'Y/3)
avg(distinct Price)
avg(total Sales)
avg({1} total Sales)
Distribution
normdist (value, mean, standard_dev)
returns the cumulative normal distribution for the specified mean and standard deviation. Value is the value at which you want to evaluate the distribution. Mean is a value stating the arithmetic mean for the distribution. Standard_dev is a positive value stating the standard deviation of the distribution. All arguments must be numeric, else null will be returned. If mean = 0 and standard_dev = 1, the function returns the standard normal distribution. This function is related to the norminv (prob, mean, standard_dev) function in the following way:
If prob = normdist(value, m, sd), then norminv(prob, m, sd) = value.
Example:
normdist( 0.5, 0, 1 ) returns 0.6914625
At this point though, I haven't found any information or functions related to Symmetry analysis, or for central tendency and dispersion measures, Mode and Variance.
Does anyone have any hints or pointers on this subject matter?
Thanks in advance for your time, regards,
Philippe
Maybe
skew([ distinct] expression)
Returns the skewness of expression over a number of records as defined by a group by clause. If the word
distinct occurs before the expression, all duplicates will be disregarded.
Example:
Load Month, skew(Sales) as SalesSkew from abc.csv group by Month;
and
kurtosis([distinct ] expression )
Returns the kurtosis of expression over a number of records as defined by a group by clause. If the word distinct
occurs before the expression, all duplicates will be disregarded.
Example:
Load Month, kurtosis(Sales) as SalesKurtosis from abc.csv group by
Month;
can help you
QlikView provides StdDev, so
variance = pow(StdDev,2)
and provides Mode() function too
Thank you Clever Anjos
Now, all that's left to figure out, is how to evaluate the Symmetry!
Cheers,
Philippe
I´m not familiar with Symmetry analysis.
Maybe
skew([ distinct] expression)
Returns the skewness of expression over a number of records as defined by a group by clause. If the word
distinct occurs before the expression, all duplicates will be disregarded.
Example:
Load Month, skew(Sales) as SalesSkew from abc.csv group by Month;
and
kurtosis([distinct ] expression )
Returns the kurtosis of expression over a number of records as defined by a group by clause. If the word distinct
occurs before the expression, all duplicates will be disregarded.
Example:
Load Month, kurtosis(Sales) as SalesKurtosis from abc.csv group by
Month;
can help you
The Symmetry analysis is a type of statistical distance measure. Here's a excerpt from Wikipedia on the subject:
In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be between an individual sample point and a population or a wider sample of points.
A distance between populations can be interpreted as measuring the distance between two probability distributions and hence they are essentially measures of distances between probability measures. Where statistical distance measures relate to the differences between random variables, these may have statistical dependence, and hence these distances are not directly related to measures of distances between probability measures. Again, a measure of distance between random variables may relate to the extent of dependence between them, rather than to their individual values.
Statistical distance measures are mostly not metrics and they need not be symmetric. Some types of distance measures are referred to as (statistical) divergences.
Metrics
A metric on a set X is a function (called the distance function or simply distance)
d : X × X → R (where R is the set of real numbers). For all x, y, z in X, this function is required to satisfy the following conditions:
- d(x, y) ≥ 0 (non-negativity)
- d(x, y) = 0 if and only if x = y (identity of indiscernibles. Note that condition 1 and 2 together produce positive definiteness)
- d(x, y) = d(y, x) (symmetry)
- d(x, z) ≤ d(x, y) + d(y, z) (subadditivity / triangle inequality).
Source: Statistical distance - Wikipedia, the free encyclopedia
Apparently, from what I've read, Symmetry can be derived in the analytics when skewness = 0
Many thanks Clever Anjos!
So checking if skewness = 0 will show you the symmetry