Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
LeonardoB25
Contributor II
Contributor II

Fractile function to delete outliers

LeonardoB25_0-1695295352582.png

Hello, I'm trying to use the Fractile() function to delete outliers from my table, but I can't manage to make it work. Documentation specifying a general <expr> in the first function clause doesn't help either.

I have a numeric colum (Valore Riga) and I just want to keep the most central 99% of data, in order to exclude extreme values from the dataset.

Thanks in advance.

Labels (1)
2 Replies
Or
MVP
MVP

It seems like this would be easier to achieve by loading the data, getting the count of rows, and then only loading rows with RowNo() greater than the bottom 0.5% and less than the top 0.5%. Calculating fractiles doesn't seem to be necessary. That said, you could calculate the fractile in a separate load (this requires a GROUP BY) and then use that to filter in a following load.

marcus_sommer

You need at least 3 steps to exclude such records, for example (simplified):

t: load * from Source;
    left join(t) load Key, frac(0.01) as Min, frac(0.99) as Max resident t group by Key;

t2: noconcatenate load * resident t where val >= Min and val <= Max;
drop tables t; drop fields Min, Max;

More simple would it be if you would specify valide values - means using fixed values in the condition like:

... where val >= 0.1 and val <= 99;

Another simple approach would be not to exclude any records else to apply a valide value check within if-loops to create a flag-field or maybe using class() to cluster the values. Flags and cluster could be then used as selections and/or dimensions and/or set analysis conditions.