Qlik Community

Ask a Question

QlikView App Dev

Discussion Board for collaboration related to QlikView App Development.

Announcements
Our next Qlik Insider session will cover new key capabilities. Join us August 11th REGISTER TODAY
cancel
Showing results for 
Search instead for 
Did you mean: 
hobanwashburne
Creator
Creator

Smoothing outliers

I am working with a data set that occasionally contains human data entry errors. An example would be entering 10,000 instead of 1,000. I would like to locate and eliminate these outliers from my data. I have created a sample data set containing two "errors".

1. Customer 51091 typically buys between $7,500 and $8,200. However, on 1/27/17 they purchased $80,390. This would be an outlier and I would want to exclude from the data set.

2. Customer 51095 typically buys between $55,000 and $65,000. However, on 3/1/17 they purchased $603,210. This would be an outlier and I would want to exclude from the data set.

1 Solution

Accepted Solutions
sunny_talwar

May be this

ExTableMapping:

Mapping

LOAD Cust,

    Avg(Amt) * 1.5 as [Upper Limit]

FROM

SampleData01.xlsx

(ooxml, embedded labels, table is Sheet1)

Group By Cust;

Fact:

LOAD Date,

    Cust,

    Amt,

    UpperLimit

Where Exclude = 1;

LOAD Date,

    Cust,

    Amt,

    ApplyMap('ExTableMapping', Cust, 0) as UpperLimit,

    If(Amt <= ApplyMap('ExTableMapping', Cust, 0), 1, 0) as Exclude

FROM

SampleData01.xlsx

(ooxml, embedded labels, table is Sheet1);

View solution in original post

3 Replies
sunny_talwar

Exclude this in the script or the front end? Do you have a rule file which specify what is the range for each customer?

hobanwashburne
Creator
Creator
Author

I will need to do this in the script. The range for each customer will have to be calculated based on their order history. Customer 51091 has an average of ~$7,900 (without the outlier) so an order of $80,390 is more than 10x their average. Likewise, their Max order was $8,121. Possibly exclude anything that is 150% their previous max order. I'm open to ideas.

sunny_talwar

May be this

ExTableMapping:

Mapping

LOAD Cust,

    Avg(Amt) * 1.5 as [Upper Limit]

FROM

SampleData01.xlsx

(ooxml, embedded labels, table is Sheet1)

Group By Cust;

Fact:

LOAD Date,

    Cust,

    Amt,

    UpperLimit

Where Exclude = 1;

LOAD Date,

    Cust,

    Amt,

    ApplyMap('ExTableMapping', Cust, 0) as UpperLimit,

    If(Amt <= ApplyMap('ExTableMapping', Cust, 0), 1, 0) as Exclude

FROM

SampleData01.xlsx

(ooxml, embedded labels, table is Sheet1);

View solution in original post