Trimean : Mean, excluding outlier
Hi,
Recently I helped a community member to calculate the Trimean, I found this indicator interesting, so I decided to share it here as well.
The trimean (TM), or Tukey's trimean, is a measure of a probability distribution's location defined as a weighted average of the distribution's median and its two quartiles: (1)
trimean calculates the mean taken by excluding a percentage of data points from the top and bottom tails of a data set. You can use this function when you wish to exclude outlying data from your analysis (2)
Example :
for the following example :TRIMMEAN is 3.778
LOAD * INLINE [
Data
4
5
6
7
2
3
4
5
1
2
3
];
To calculate Trimean on Qlikview, we will take the example below :
DATE | CLAIM PAID |
02/01/2020 | 250 |
05/01/2020 | 750 |
08/01/2020 | 25000 |
11/01/2020 | 50000 |
14/01/2020 | 75000 |
17/01/2020 | 2500 |
20/01/2020 | 3500 |
23/01/2020 | 500000 |
26/01/2020 | 850000 |
29/01/2020 | 25000 |
01/02/2020 | 10000 |
04/02/2020 | 1800000 |
07/02/2020 | 50000 |
10/02/2020 | 65000 |
13/02/2020 | 45000 |
16/02/2020 | 1500000 |
19/02/2020 | 450000 |
22/02/2020 | 18500 |
25/02/2020 | 25000 |
28/02/2020 | 50000 |
02/03/2020 | 75000 |
the expected output is :
Normal Average | 267 643 |
Trimmean | 201 066 |
the script can be presented as :
Data0:
LOAD
DATE,
[CLAIM PAID]
FROM
[.\TRIMMEAN(1).xlsx]
(ooxml, embedded labels, table is Sheet1);
Data:
noconcatenate
LOAD rowno() as IDtmp,
DATE,
[CLAIM PAID]
resident Data0 order by [CLAIM PAID];
drop table Data0;
let percent=0.1;
Tmp1:
load count([CLAIM PAID]) as CountClaim resident Data;
let vcount=peek('CountClaim',0,'tmp');
drop table Tmp1;
let k=floor($(vcount)*($(percent))/2);
Tmp2:
load min(IDtmp) as minIDtmp, max(IDtmp) as maxIDtmp resident Data;
let Idmin=peek('minIDtmp',0,'Tmp2')+$(k);
let Idmax=peek('maxIDtmp',0,'Tmp2')-$(k);
drop table Tmp2;
output :