I am working with a data set that occasionally contains human data entry errors. An example would be entering 10,000 instead of 1,000. I would like to locate and eliminate these outliers from my data. I have created a sample data set containing two "errors".
1. Customer 51091 typically buys between $7,500 and $8,200. However, on 1/27/17 they purchased $80,390. This would be an outlier and I would want to exclude from the data set.
2. Customer 51095 typically buys between $55,000 and $65,000. However, on 3/1/17 they purchased $603,210. This would be an outlier and I would want to exclude from the data set.
I will need to do this in the script. The range for each customer will have to be calculated based on their order history. Customer 51091 has an average of ~$7,900 (without the outlier) so an order of $80,390 is more than 10x their average. Likewise, their Max order was $8,121. Possibly exclude anything that is 150% their previous max order. I'm open to ideas.