Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Ignoring outliers

Hi,

I'm new to the forum and fairly new to qlikview, so please excuse my ignorance! Please could someone help with the following problem:

Basically i've got millions of rows of data and probably a 5% DQ problem, so I'm unable to use the min() or Max () functions in isolation as the DQ issues will always dominate the results. If for example we want to show the minimum weight of a car we would get a return of 1kg from the following data on car weights: 1kg, 2kg, 4kg, 850kg, 875kg, 860kg, 850kg, 850kg, 850kg, 875kg, 115,000kg and a maximum weight of 115,000kg.

Clearly this makes our reports a bit of nonsense to our users. Is there anyway we could get the min() function to return the lowest weight that has more than one occurrence (with the state of our data probably an occurrence rate of 5 would be more appropriate) ? and visa-versa with the maximum weight?

Many Thanks

1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

See if it makes sense in your situation:


table:
LOAD
Weight,
count(distinct Wieght) as Frequency
FROM ...;
//
MaxMin:
LOAD
max(Wieght) as MaxWieght,
min(Wieght) as MinWieght
RESIDENT table
WHERE Frequency>=5


View solution in original post

6 Replies
Anonymous
Not applicable
Author

See if it makes sense in your situation:


table:
LOAD
Weight,
count(distinct Wieght) as Frequency
FROM ...;
//
MaxMin:
LOAD
max(Wieght) as MaxWieght,
min(Wieght) as MinWieght
RESIDENT table
WHERE Frequency>=5


johnw
Champion III
Champion III

I don't think "minimum weight with more than one occurrence" is a good metric. If I can make a typo once, I can make the typo twice. That doesn't mean its a good car weight, and you're back to your original problem. On the opposite side, just because only one car weighs 837.952 KG doesn't make the weight wrong.

You may instead want either a minimum reasonable weight for cars you deal with, such as rangemax(600,min(CarWeight)), or some percentile weight, such as fractile(CarWeight,.05).

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

See the QV Cookbook example "Removing outlier values from a chart dimension." for some examples of removing outliers using chart expressions.

-Rob

Not applicable
Author

Hi Michael,

thanks for this, i think this is the direction i need to take.

Cheers,

Not applicable
Author

Thanks for all the advice chaps.

. It certainly gives me ideas for this problem and several other current issues.

Cheers,

Not applicable
Author

Can you point me to the cookbook example? I have the same issue.

Thanks,

Kevin