I have data that normally follows a very clear and smooth line that can slope upward and downwards but it has a lot of short and very steep spikes that distort the data (from protocol errors). I would like to ignore them in my chart, esentlially "clipping them out" of the data. For example, a typical outliner could be identified in the following data sample:
The ones in red are the outliers I need to filter out of my chart. There are many of those and their magnitude changes slightly over the dataset, so it is not very straight forward to just use an if/then condition to filter out the ones that are >5600 for example, as it would leave holes in the curve.
An example screenshot of my data is attached
I was hoping to be able to have a filter expression that can ignore them based on the median (or other method).
EDIT: I have added an example screen shot of the data graph with and without a very simple upper boundary IF condition (if([E7 Nivel de producto (mm)]<4000,median([E7 Nivel de producto (mm)])) that helps a bit to cleanup the curve, but it is very limited and "dumb" and will not work well for all the data.
Help is greatly appreciated!
Could you not let the users filter the outliers either through the chart lasso tool or if a regular occurance then bookmark to exclude them?
based on your data, I dont think there is any easy way to strip out the outliers, defined bounds will not catch all instances as some outliers are actually below your expected values later on. Also you have real sudden changes in the data so simply comparing to the previous values to decide whether to replace an 'outlier' will also not be correct. Some combination of medians across a range of values before and after may be possible, but I would suggest for the sake of visualisation leaving the outliers in and allowing the user to interpret them themselves would be best.
Otherwise replacement with either the above() or below() value would probably be best, not the median, using whatever method suitable to not distort the actual data, which should be defined with the user/ subject matter experts.
You will need to have the data sorted correctly in the table to use above or below in the script.
try dimension limit tab by going into the properties and click the restriction check box and and select 'show only values that accumulate to:' radio buttton and enter any accumulation percentage you want eg:80%
thanks & Regards,
Hi Tom, you are right on with your analisys and I think this is a tough one to solve with simple tricks. I am new to Clik so I am not sure of all the scripting and more complex math that can be done, however from a conceptual point of view my thoughts on how to solve this would be as follows:
Calculate a moving median and compare each point to be plotted aaginst this moving median right before and after the data point, and only discard the ones that are a certain % greater than the median. I guess this would be a very narrow "filter" that would not plot values that are ridiculously high compared to the moving median.
I have been playing with this in Excel and it seems to be possible, but it is complex.
For this use case it is not desirable for my end users to see the outliers, I need to present a clean undistorted graph. But I did not know this can be done with Clik so I will need to test this out to see how it works, thank you for the idea.
Andy, I just started playing with the lasso tool and it is very powerful. Altough it is a fully manual process, it is not bad to get the data cleaned up quite a bit.
Is it possible to lasso the data (or pick single data points) to EXCLUDE them from the current plot? I guess the way the lasso tool works now is that it allows you to select the subset of data that you want plotted, not excluded.