Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all
I need to find the upper and lower quartile for a given set of records using the most efficient method possible as I need to return the results with sub second response times for millions of rows of data.
As the set changes with the selections the calculation cannot be written into the code.
Therefore do I put the calculation in the expression on each chart or write some sort of function?
Kindest Regards
Paul Bartram
fractile(Data,.25) and fractile(Data,.75)
Paul,
I'd recommend to try a macro using API method ParetoSelect or TopSelect. Although the sub-second response looks really tough if you cannot precalculate in the script.
If I understand the issue correctly you should be able to create something like this using the Rank() function.
For example if you have one dimension called Dimension and one expression, Sum(Value) you could use ranking on the Sum(Value) and then compare that ranking to the highest total ranking and only use those that are in the top quartile or any other range really.
The expression would look something like:
if(Rank(Sum(Value),0,1)<=floor(max(total aggr(Rank(Sum(Value),0,1),Dimension))*0.25),1,0)
If the rank for the current dimension value is lower or equal to the highest rank * 0.25, the expression returns 1, else 0.
Best regards,
Jsn
Thank you for your quick responses, however this doesn't account for the following which is normally resolved in a Quartile function
Data
1
2
4
7
8
9
10
12
If the dataset is like the one above, to work out the Quartile you have to say there are (8 datapoints +1)/4 which equal 2.25, therefore both 2 and 4 fall into this. so you have to say (2+4)/2 = 3.5
So the first quartile to the dataset above is 3.5, your equation would select 2 as the Quartile which is incorrect.
I need a quick method to calculate the above possibility.
Kindest Regards
Paul
Thank you for you reply, I think the TopSelect will have the same issue as my comment above, however I'm not sure what ParetoSelect is?
fractile(Data,.25) and fractile(Data,.75)
Hi Paul,
I'm not sure how any specific quartile calculation would work. The (2+4)/2 should equal 3 though I'm not sure how this relates. The expression would likely need some modification to calculate the cut-off criteria correctly for your desired result.
in addition to John suggestion to use fractile, you have a document property settings you ay adjust to improve calculation speed :
when not selected, this function interpolates (as Excel centile function does) so I assume this is faster.
when selected, QV use discontinuated values calculation.
Paul,
TopSelect allows to make selections like "top 5 sales reps, where criteria is sum(Sales)".
ParetoSelect allows to make selections like "top sales reps, who bring 75% of sales, where criteria is sum(Sales)".
I think that solution from jsn is functionaly correct (maybe you need some adjustments), but it may be slow because of aggr() function. Worth trying anyway.
Ok were getting closer, I thought the Fractile function might of been the answer, but it's not quite there.
As a test for anyones function to find the lower quartile and upper quartile for the following data:
11, 4, 6, 8, 3, 10, 8, 10, 4, 12 and 31.
To work this out manully you first have to Order the data, so we get 3, 4, 4, 6, 8, 8,10, 10, 11, 12 and 31.
The lower quartile is the (11 + 1) ÷ 4 = 3rd value in on the dataset. (11 being the count of datapoints)
The upper quartile is the 3 (11 + 1) ÷ 4 = 9th value in on the dataset.
Therefore, the lower quartile is 4, and the upper quartile is 11.
3, 4, 4, 6, 8, 8, 10, 10, 11, 12, 31
I got the above example from the BBC website http://www.bbc.co.uk/schools/gcsebitesize/maths/data/representingdata3hirev4.shtml
Additional ref: http://en.wikipedia.org/wiki/Quartile
Johns, factile function fractile(Data,.25) for the lower quartile returns the answer as 5 and fractile(Data,.75) for the upper quartile returns the answer as 10.5, neither of which are one of the datapoints. The problem arrises when the factile falls between two datapoints, these to adjacent datapoints have to be added together and divided by 2.
Based on the examples from the BBC and Wikipedia, the Quartile function in Excel returns the incorrect result (sames as Johns Factile). Which beggers the question whos right?
Kindest Regards
Paul Bartram