Qlik Community

New to Qlik Sense

Discussion board where members can get started with Qlik Sense.

Not applicable

Filter out sharp spikes (outliers)

I have data that normally follows a very clear and smooth line that can slope upward and downwards but it has a lot of short and very steep spikes that distort the data (from protocol errors). I would like to ignore them in my chart, esentlially "clipping them out" of the data. For example, a typical outliner could be identified in the following data sample:

2418

2418

2419

2418

5699

2420

2418

2419

or

1418

1419

1420

1421

3566

1430

1432

1433

The ones in red are the outliers I need to filter out of my chart. There are many of those and their magnitude changes slightly over the dataset, so it is not very straight forward to just use an if/then condition to filter out the ones that are >5600 for example, as it would leave holes in the curve.

An example screenshot of my data is attached

I was hoping to be able to have a filter expression that can ignore them based on the median (or other method).

EDIT: I have added an example screen shot of the data graph with and without a very simple upper boundary IF condition (if([E7 Nivel de producto (mm)]<4000,median([E7 Nivel de producto (mm)])) that helps a bit to cleanup the curve, but it is very limited and "dumb" and will not work well for all the data.

Help is greatly appreciated!

13 Replies
ogster1974
Honored Contributor II

Re: Filter out sharp spikes (outliers)

Could you not let the users filter the outliers either through the chart lasso tool or if a regular occurance then bookmark to exclude them?

tcullinane
Contributor II

Re: Filter out sharp spikes (outliers)

based on your data, I dont think there is any easy way to strip out the outliers, defined bounds will not catch all instances as some outliers are actually below your expected values later on. Also you have real sudden changes in the data so simply comparing to the previous values to decide whether to replace an 'outlier' will also not be correct. Some combination of medians across a range of values before and after may be possible, but I would suggest for the sake of visualisation leaving the outliers in and allowing the user to interpret them themselves would be best.

Otherwise replacement with either the above() or below() value would probably be best, not the median, using whatever method suitable to not distort the actual data, which should be defined with the user/ subject matter experts.

You will need to have the data sorted correctly in the table to use above or below in the script.

Not applicable

Re: Filter out sharp spikes (outliers)

Hi,

try dimension limit tab by going into the properties and click the restriction check box and and select 'show only values that accumulate to:' radio buttton and enter any accumulation percentage you want eg:80%

thanks & Regards,

prem utukuri

karthiksrqv
Contributor II

Re: Filter out sharp spikes (outliers)

Hi,

You could use the 'Range >Min , Max' available in the line-charts Appearance > Y-axis > Range > Custom

Not applicable

Re: Filter out sharp spikes (outliers)

Hi Tom, you are right on with your analisys and I think this is a tough one to solve with simple tricks.  I am new to Clik so I am not sure of all the scripting and more complex math that can be done, however from a conceptual point of view my thoughts on how to solve this would be as follows:

Calculate a moving median and compare each point to be plotted aaginst this moving median right before and after the data point, and only discard the ones that are a certain % greater than the median. I guess this would be a very narrow "filter" that would not plot values that are ridiculously high compared to the moving median.

I have been playing with this in Excel and it seems to be possible, but it is complex.

Not applicable

Re: Filter out sharp spikes (outliers)

For this use case it is not desirable for my end users to see the outliers, I need to present a clean undistorted graph. But I did not know this can be done with Clik so I will need to test this out to see how it works, thank you for the idea.

Not applicable

Re: Filter out sharp spikes (outliers)

Hi, the problem is that these boundaries are dynamic and that method will discard good data as well I am afraid (If I understood your idea correctly).

Not applicable

Re: Filter out sharp spikes (outliers)

Hi, the problem is that these ranges are dynamic and change throughout the dataset.

Not applicable

Re: Filter out sharp spikes (outliers)

Andy, I just started playing with the lasso tool and it is very powerful. Altough it is a fully manual process, it is not bad to get the data cleaned up quite a bit. 

Is it possible to lasso the data (or pick single data points) to EXCLUDE them from the current plot?  I guess the way the lasso tool works now is that it allows you to select the subset of data that you want plotted, not excluded.

Thank you,