Re: Dynamically Remove Outliers From LinEst_M Calc... - Qlik Community

UncleRiotous · ‎2024-06-18

I'm trying to work out how to dynamically exlude outliers from a calculation I'm making, in this case using LinEst_M. Apologies for the long post.

I've got the following data set and by working out the IQR using the formulas below I've identified outliers for Type1 records as being anything below -44.75 and above 82.75, for Type2 -25.25 and 98.5.

Fractile(X_AXIS,0.25)
-
((
Fractile(X_AXIS,0.75)
-
Fractile(X_AXIS,0.25)
)*2)

and

Fractile(X_AXIS,0.75)
+
((
Fractile(X_AXIS,0.75)
-
Fractile(X_AXIS,0.25)
)*2)

My standard measure is

LinEst_M(Y_AXIS,X_AXIS)

and I can manually remove the one outlier in this Type1 data set by using

LinEst_M(

{<X_AXIS={">=-44.75<=82.75"},

DATA_TYPE={'Type1'}>}

Y_AXIS,X_AXIS)

or for Type2 data...

LinEst_M(

{<X_AXIS={">=-24.75<=97.75"},

DATA_TYPE={'Type2'}>}

Y_AXIS,X_AXIS)

I'd like to dynamically restrict the data used by the LinEst_M calculation for each Data Type however I can't figure out how to join the various bits of syntax.

So the calculations look like this...

and the trend lines are massively different

If you want to reproduce the examples...

Load Script

MAIN_DATA:
Load * Inline [
DATA_TYPE,JOB_REF,Y_AXIS,X_AXIS
Type1,0E9AEDCA,1941,2
Type1,F84DF6A1,13186,3
Type1,01E763D1,4606,4
Type1,3F31115F,4247,6
Type1,060B92B4,9234,8
Type1,66F87B58,8482,9
Type1,C334D0A1,2816,11
Type1,8FCC977B,4799,12
Type1,504663C3,9778,13
Type1,88525D17,13300,16
Type1,B32518E5,3143,37
Type1,1E9B85B9,2888,38
Type1,33CC0380,11049,81
Type1,266BA90C,24350,253
Type2,0AED22C3,19295,7
Type2,965DE786,17740,22
Type2,0D7CAE57,18505,23
Type2,347D112A,12667,24
Type2,9EB1EB06,14065,25
Type2,DBF09D71,19619,32
Type2,AB52FB03,3304,38
Type2,E2AE22FD,1798,43
Type2,57649F4D,6823,45
Type2,D100EE9D,7972,48
Type2,95F9472F,8860,49
Type2,D3A0EF03,11802,50
Type2,41827B71,15630,54
Type2,AF44702D,52546,300
];

Measures

LinEst_M

LinEst_M(Y_AXIS,X_AXIS)

1st Quartile

Fractile(X_AXIS,0.25)

3rd Quartile

Fractile(X_AXIS,0.75)

IQR

Fractile(X_AXIS,0.75)

-

Fractile(X_AXIS,0.25)

Lower Threshold (2*IQR)

Fractile(X_AXIS,0.25)

-

((

Fractile(X_AXIS,0.75)

-

Fractile(X_AXIS,0.25)

)*2)

Higher Threshold (2*IQR)

Fractile(X_AXIS,0.75)

+

((

Fractile(X_AXIS,0.75)

-

Fractile(X_AXIS,0.25)

)*2)

David_Friend · ‎2024-06-18

@UncleRiotous have you tried asking ChatGPT on this, may at least provide you with a jumping off point...

marcus_sommer · ‎2024-06-19

You may calculate the lower/upper outlier within the set analysis or as separate results - something in this way:

LinEst_M(

{<X_AXIS={">=$(=ExpLowerOutlier)<=$(=ExpUpperOutlier)"}>}

Y_AXIS,X_AXIS)

or

rangemin(ExpUpperOutlier, rangemax(ExpLowerOutlier, LinEst_M(Y_AXIS,X_AXIS)))

You may need to play a bit with the range-function nesting. I couldn't remember the right way because depending on the needed scenario the order is changing - and therefore doing it also with some trial & error.

UncleRiotous · ‎2024-06-19

I can't see an ExpUpperOutlier or ExpLowerOutlier function in Qlik.

I can calculate the upper and lower separately but I can't work out how to use those to filter an expression in a dynamic way for each group of results.

marcus_sommer · ‎2024-06-19

This wasn't meant as hint for a function() else as a reference to your expressions which are calculating the thresholds for the lower/upper outliers.

Dynamically Remove Outliers From LinEst_M Calculation

Chart

filter

Qlik Sense

Set Analysis