Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

How to perform the statistical function: Quartile (Upper and Lower)

Hi all

I need to find the upper and lower quartile for a given set of records using the most efficient method possible as I need to return the results with sub second response times for millions of rows of data.

As the set changes with the selections the calculation cannot be written into the code.

Therefore do I put the calculation in the expression on each chart or write some sort of function?

Kindest Regards
Paul Bartram

1 Solution

Accepted Solutions
johnw
Champion III
Champion III

fractile(Data,.25) and fractile(Data,.75)

View solution in original post

13 Replies
Anonymous
Not applicable
Author

Paul,
I'd recommend to try a macro using API method ParetoSelect or TopSelect. Although the sub-second response looks really tough if you cannot precalculate in the script.

Anonymous
Not applicable
Author

If I understand the issue correctly you should be able to create something like this using the Rank() function.

For example if you have one dimension called Dimension and one expression, Sum(Value) you could use ranking on the Sum(Value) and then compare that ranking to the highest total ranking and only use those that are in the top quartile or any other range really.

The expression would look something like:

if(Rank(Sum(Value),0,1)<=floor(max(total aggr(Rank(Sum(Value),0,1),Dimension))*0.25),1,0)

If the rank for the current dimension value is lower or equal to the highest rank * 0.25, the expression returns 1, else 0.

Best regards,

Jsn

Not applicable
Author

Thank you for your quick responses, however this doesn't account for the following which is normally resolved in a Quartile function

Data
1
2
4
7
8
9
10
12

If the dataset is like the one above, to work out the Quartile you have to say there are (8 datapoints +1)/4 which equal 2.25, therefore both 2 and 4 fall into this. so you have to say (2+4)/2 = 3.5

So the first quartile to the dataset above is 3.5, your equation would select 2 as the Quartile which is incorrect.

I need a quick method to calculate the above possibility.

Kindest Regards
Paul

Not applicable
Author

Thank you for you reply, I think the TopSelect will have the same issue as my comment above, however I'm not sure what ParetoSelect is?

johnw
Champion III
Champion III

fractile(Data,.25) and fractile(Data,.75)

Anonymous
Not applicable
Author

Hi Paul,

I'm not sure how any specific quartile calculation would work. The (2+4)/2 should equal 3 though I'm not sure how this relates. The expression would likely need some modification to calculate the cut-off criteria correctly for your desired result.

yblake
Partner - Creator II
Partner - Creator II

in addition to John suggestion to use fractile, you have a document property settings you ay adjust to improve calculation speed :

when not selected, this function interpolates (as Excel centile function does) so I assume this is faster.

when selected, QV use discontinuated values calculation.

Anonymous
Not applicable
Author

Paul,
TopSelect allows to make selections like "top 5 sales reps, where criteria is sum(Sales)".
ParetoSelect allows to make selections like "top sales reps, who bring 75% of sales, where criteria is sum(Sales)".
I think that solution from jsn is functionaly correct (maybe you need some adjustments), but it may be slow because of aggr() function. Worth trying anyway.

Not applicable
Author

Ok were getting closer, I thought the Fractile function might of been the answer, but it's not quite there.

As a test for anyones function to find the lower quartile and upper quartile for the following data:

11, 4, 6, 8, 3, 10, 8, 10, 4, 12 and 31.

To work this out manully you first have to Order the data, so we get 3, 4, 4, 6, 8, 8,10, 10, 11, 12 and 31.

The lower quartile is the (11 + 1) ÷ 4 = 3rd value in on the dataset. (11 being the count of datapoints)
The upper quartile is the 3 (11 + 1) ÷ 4 = 9th value in on the dataset.

Therefore, the lower quartile is 4, and the upper quartile is 11.

3, 4, 4, 6, 8, 8, 10, 10, 11, 12, 31

I got the above example from the BBC website http://www.bbc.co.uk/schools/gcsebitesize/maths/data/representingdata3hirev4.shtml

Additional ref: http://en.wikipedia.org/wiki/Quartile

Johns, factile function fractile(Data,.25) for the lower quartile returns the answer as 5 and fractile(Data,.75) for the upper quartile returns the answer as 10.5, neither of which are one of the datapoints. The problem arrises when the factile falls between two datapoints, these to adjacent datapoints have to be added together and divided by 2.

Based on the examples from the BBC and Wikipedia, the Quartile function in Excel returns the incorrect result (sames as Johns Factile). Which beggers the question whos right?

Kindest Regards
Paul Bartram