Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello, I'm trying to use the Fractile() function to delete outliers from my table, but I can't manage to make it work. Documentation specifying a general <expr> in the first function clause doesn't help either.
I have a numeric colum (Valore Riga) and I just want to keep the most central 99% of data, in order to exclude extreme values from the dataset.
Thanks in advance.
It seems like this would be easier to achieve by loading the data, getting the count of rows, and then only loading rows with RowNo() greater than the bottom 0.5% and less than the top 0.5%. Calculating fractiles doesn't seem to be necessary. That said, you could calculate the fractile in a separate load (this requires a GROUP BY) and then use that to filter in a following load.
You need at least 3 steps to exclude such records, for example (simplified):
t: load * from Source;
left join(t) load Key, frac(0.01) as Min, frac(0.99) as Max resident t group by Key;
t2: noconcatenate load * resident t where val >= Min and val <= Max;
drop tables t; drop fields Min, Max;
More simple would it be if you would specify valide values - means using fixed values in the condition like:
... where val >= 0.1 and val <= 99;
Another simple approach would be not to exclude any records else to apply a valide value check within if-loops to create a flag-field or maybe using class() to cluster the values. Flags and cluster could be then used as selections and/or dimensions and/or set analysis conditions.