Skip to main content
Announcements
Jan 15, Trends 2025! Get expert guidance to thrive post-AI with After AI: REGISTER NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
UncleRiotous
Creator
Creator

Dynamically Remove Outliers From LinEst_M Calculation

I'm trying to work out how to dynamically exlude outliers from a calculation I'm making, in this case using LinEst_M. Apologies for the long post.
 
I've got the following data set and by working out the IQR using the formulas below I've identified outliers for Type1 records as being anything below -44.75 and above 82.75, for Type2 -25.25 and 98.5.
 
Fractile(X_AXIS,0.25)
-
((
Fractile(X_AXIS,0.75)
-
Fractile(X_AXIS,0.25)
)*2)
 
and
 
Fractile(X_AXIS,0.75)
+
((
Fractile(X_AXIS,0.75)
-
Fractile(X_AXIS,0.25)
)*2)
 
My standard measure is 
LinEst_M(Y_AXIS,X_AXIS)
 
and I can manually remove the one outlier in this Type1 data set by using
LinEst_M(
{<X_AXIS={">=-44.75<=82.75"},
DATA_TYPE={'Type1'}>}
Y_AXIS,X_AXIS)
 
or for Type2 data...
LinEst_M(
{<X_AXIS={">=-24.75<=97.75"},
DATA_TYPE={'Type2'}>}
Y_AXIS,X_AXIS)
 
I'd like to dynamically restrict the data used by the LinEst_M calculation for each Data Type however I can't figure out how to join the various bits of syntax.
 
So the calculations look like this...
UncleRiotous_1-1718727148943.png

and the trend lines are massively different

UncleRiotous_2-1718727186450.png

 


 

 
If you want to reproduce the examples...
 
Load Script
 
MAIN_DATA:
Load * Inline [
DATA_TYPE,JOB_REF,Y_AXIS,X_AXIS
Type1,0E9AEDCA,1941,2
Type1,F84DF6A1,13186,3
Type1,01E763D1,4606,4
Type1,3F31115F,4247,6
Type1,060B92B4,9234,8
Type1,66F87B58,8482,9
Type1,C334D0A1,2816,11
Type1,8FCC977B,4799,12
Type1,504663C3,9778,13
Type1,88525D17,13300,16
Type1,B32518E5,3143,37
Type1,1E9B85B9,2888,38
Type1,33CC0380,11049,81
Type1,266BA90C,24350,253
Type2,0AED22C3,19295,7
Type2,965DE786,17740,22
Type2,0D7CAE57,18505,23
Type2,347D112A,12667,24
Type2,9EB1EB06,14065,25
Type2,DBF09D71,19619,32
Type2,AB52FB03,3304,38
Type2,E2AE22FD,1798,43
Type2,57649F4D,6823,45
Type2,D100EE9D,7972,48
Type2,95F9472F,8860,49
Type2,D3A0EF03,11802,50
Type2,41827B71,15630,54
Type2,AF44702D,52546,300
];
 
 
Measures
LinEst_M
LinEst_M(Y_AXIS,X_AXIS)
 
1st Quartile
Fractile(X_AXIS,0.25)
 
3rd Quartile
Fractile(X_AXIS,0.75)
 
IQR
Fractile(X_AXIS,0.75)
-
Fractile(X_AXIS,0.25)
 
Lower Threshold (2*IQR)
Fractile(X_AXIS,0.25)
-
((
Fractile(X_AXIS,0.75)
-
Fractile(X_AXIS,0.25)
)*2)
 
Higher Threshold (2*IQR)
Fractile(X_AXIS,0.75)
+
((
Fractile(X_AXIS,0.75)
-
Fractile(X_AXIS,0.25)
)*2)
Labels (4)
4 Replies
David_Friend
Support
Support

@UncleRiotous have you tried asking ChatGPT on this, may at least provide you with a jumping off point...

marcus_sommer

You may calculate the lower/upper outlier within the set analysis or as separate results - something in this way:

LinEst_M(
{<X_AXIS={">=$(=ExpLowerOutlier)<=$(=ExpUpperOutlier)"}>}
Y_AXIS,X_AXIS)


or

rangemin(ExpUpperOutlier, rangemax(ExpLowerOutlier, LinEst_M(Y_AXIS,X_AXIS)))

You may need to play a bit with the range-function nesting. I couldn't remember the right way because depending on the needed scenario the order is changing - and therefore doing it also with some trial & error.

 

 

UncleRiotous
Creator
Creator
Author

I can't see an ExpUpperOutlier or ExpLowerOutlier function in Qlik.

I can calculate the upper and lower separately but I can't work out how to use those to filter an expression in a dynamic way for each group of results.

marcus_sommer

This wasn't meant as hint for a function() else as a reference to your expressions which are calculating the thresholds for the lower/upper outliers.