Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Qlik Community,
Thanks so much for all your help to date-- it's been invaluable. I have run into an issue trying to use the above() function in text expressions. Because the function relies on data sorting, I cannot load data in and have it calculate correctly.
I am trying to calculate a statistic called "KS". This statistic is the maximum of the difference between two cumulative distribution functions.
Now, I am able to calculate this function in a table quite easily using a combination of rangesum(), above(), and TOTAL. The CDF for the distribution "goods" is calculated as
rangesum(above(Sum(If(Bad=0,1,0)),0,rowno())) / Sum( TOTAL If(Bad=0,1,0))
and for the distribution "bads" as
rangesum(above(Sum(If(Bad=0,1,0)),0,rowno())) / Sum( TOTAL If(Bad=1,1,0))
then the difference between the two at any point is then
fabs(
rangesum(above(Sum(If(Bad=0,1,0)),0,rowno()))/Sum( TOTAL If(Bad=0,1,0))
-rangesum(above(Sum(If(Bad=1,1,0)),0,rowno()))/Sum( TOTAL If(Bad=1,1,0))
)
which is KS.
Now, I'd like to calculate the MAXIMUM of this number over a field called "Score". However, I cannot get my head around how to incorporate a sort function into the above function to get it to take "Score" as ascending. I am using the AGGR() function to get to the max, so the text expression I have invoked is:
MAX(AGGR(fabs(
rangesum(above(Sum( If(Bad=0,1,0)),0,rowno()))/Sum( TOTAL If(Bad=0,1,0))
-rangesum(above(Sum( If(Bad=1,1,0)),0,rowno()))/Sum( TOTAL If(Bad=1,1,0))
),Score))
But since the ordering is by how the file is loaded, we arrive at the wrong answer (18%)
Attached you'll find an example of the issue I'm running into. Any help is greatly appreciated!
It's not possible to sort the aggr function (alas!!). But it is possible to sort the data in the script. See attached example. Is that an option for you?
Hi Gysbert:
That's a great crack at it. Unfortunately, we may have situations where we wish to measure the "KS" value for multiple Scores. This means that it's putting the logic in the load statement isn't optimal (I'd have to do separate loads for each score).
A way I was thinking of "hacking it" is just to create some score deciles or twentiles "hard coded" as variables, and take the max of KS at each of those points.