Qlik Community

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Employee
Employee

Recipe for a Box Plot

When you want to look at the distribution of a measurement, a histogram is one possibility. However, if you want to show the distribution split over several dimensional values, a Box Plot may be a better choice.

You may, for instance, want to evaluate the quality of units produced in different machines, or delivered by different suppliers. Then, a Box Plot is an excellent choice to display the characteristic that you want to examine:

Box plot.png

The graph clearly shows you the performance of the different machines compared to target: Machine A has the precision, but not the accuracy. Machine F has the accuracy, but not the precision.

The Box Plot provides an intuitive graphical representation of several properties of the data set. The box itself represents the main group of measurements, with a center line representing the middle of the data. Usually the median and the upper and lower quartile levels are used to define the box, but it is also possible to use the average plus/minus one standard deviation.

The whiskers are used to show the spread of the data, e.g. the largest and smallest measurements can be used. Usually, however, the definition is slightly more intricate. Below I will use the definition used in six sigma implementations.

There, the whiskers are often used to depict the largest and smallest values within an acceptable range, whereas values outside this range are outliers.

The concept of the Inter Quartile Range (IQR) – the difference between the upper and lower quartile level – is used to calculate the acceptance range. Hence:

  • Inter Quartile Range (IQR) = Upper Quartile Line (UQL) – Lower Quartile Line (LQL)
  • Upper Acceptance Limit (UAL) = UQL + 1.5 * IQR
  • Lower Acceptance Limit (LAL) = LQL - 1.5 * IQR

The picture below summarizes the box plot.


Box plot explained.png


And here is how you implement this in QlikView…

  1. Go to the Tools menu and choose “Box Plot Wizard”.
  2. On the “Step 1 - Define data” page, you choose your dimension. In my example, this was Machine, but it could be Supplier or Batch or something similar.
  3. Use the same dimension once more in the “Aggregator” control.
  4. Use the average of your measurement in the “Expression” control – Avg(Measurement).
  5. Click “Next”.
  6. On the “Step 2 - Presentation” page, you should choose “Median mode”.
  7. Check “Include Whiskers” and “Use Outliers”.
  8. Click “Finish”.

QlikView has now created a Box Plot with general expressions that almost always display a meaningful result, and allows for an intermediate aggregator. However, the expressions are not what we want for a six sigma box plot, so we need to change them to the following: (Below, the dimension is called Dim, and the measurement is called Val.)

  • Box Plot Middle: Median(Val)
  • Box Plot Bottom: Fractile(Val,0.25)
  • Box Plot Top: Fractile(Val,0.75)

The whiskers and the outliers all need a nested aggregation – each value needs to be compared to the acceptance levels for the group – so they all contain an Aggr() function that calculates the relevant acceptance limit:

  • Box Plot Lower Whisker:
    Min(If(Val>= Aggr(2.5*Fractile(total <Dim> Val,0.25) -1.5*Fractile(total <Dim> Val,0.75), Dim, Val), Val))
  • Box Plot Upper Whisker:
    Max(If(Val<= Aggr(2.5*Fractile(total <Dim> Val,0.75) -1.5*Fractile(total <Dim> Val,0.25), Dim, Val), Val))
  • Lower Outlier:
    Min(If(Val< Aggr(2.5*Fractile(total <Dim> Val,0.25) -1.5*Fractile(total <Dim> Val,0.75), Dim, Val), Val))
  • Upper Outlier:
    Max(If(Val> Aggr(2.5*Fractile(total <Dim> Val,0.75) -1.5*Fractile(total <Dim> Val,0.25), Dim, Val), Val))

And with this, I leave you to create your own box plots.

HIC

Further reading related to data classification:

Recipe for a Histogram

Buckets

Recipe for a Pareto Analysis

15 Comments
MVP
MVP

Thanks Henric.  My wife is a statistician and SAS programmer, so box plots are everyday things for her.  This is now another thing I can do quicker, easier and better looking in QlikView. 

236 Views
Employee
Employee

Glad you like it! Just note that my formulae are just one way of doing it: They follow the definitions in Six Sigma. There are other ways to define the five formulae, so you need to find out which definition to use.

HIC

236 Views
juleshartley
Valued Contributor

Very nice... and useful! Thanks Henric.

0 Likes
236 Views
pover
Honored Contributor

Given the trend to use small multiples or sparklines, I've always wondered why there isn't a graph that shows mini-distributions instead of the boxplot.  I haven't seen anything like the example below, but I'm curious if anybody has seen anything like it or know why it doesn't exist. .

BoxPlot_Alternative.png

Karl

236 Views
Not applicable

Nice!  Now how about a variable width box plot where the width represents the size of the sample relative to the others.  This is particularly helpful when you show a view across a dimension like your 'machine' example above.

0 Likes
236 Views
kumarnatarajan
Valued Contributor

Thanks henric this very nice post. The box polt chart i heard more in sas now i got idea in qlikview.

236 Views
Not applicable

Henrich, is it possible to visualize two boxplots for the same dimension value in the same chart (in a different color)? Adding an extra dimensions seems to distort the boxplot (unless you use trellis but that's not what I want). I cannot add an extra boxplot expression as that option is greyed out.  

0 Likes
236 Views
Employee
Employee

No, I cannot get it to work either. I'll check with the developers my this is the case and come back to you.

HIC

0 Likes
236 Views
Not applicable

Amazing! I've been playing around with this and absolutely love it - but I do have one question...

Is it possible to modify this, or the model to select outliers across more than one dimension? For example, lets say you use your cursor to create a box that would select >450 across machines D, E, and F and have say a list of products created on those machines that were outliers be shown in another table. In theory could this be done?

Just curious if someone has thought of this or achieved it, it would be really useful for some of my current work.

0 Likes
236 Views
andreas_koehler
New Contributor III

Thanks Henric,

This is a very valuable visualization method.

I applied your recipe (1-8) on measurements and data but it did not work out.

screengrab_boxplot.PNG

I had imported the time series of data via a crosstable

CrossTable(TagCode, #Data)

LOAD *

FROM

[$(vSource)AMPrototype.xlsx]

which I use to load Timestamp and value for each TagCode.

Could you give me some guidance on how to narrow down the mistake I did?

Andreas

0 Likes
236 Views
Employee
Employee

You need to check your expressions for the Bottom, Top and Whiskers. They seem to be undefined or have wrong values.

HIC

Image2.png

236 Views
andreas_koehler
New Contributor III

Thanks Henric,

this was indeed the case. I misunderstood that I would already see a plausible plot when I follow just Step 1-8 without adapting the expressions of Boxes and wiskers. I thought that these later steps were to make the box plot in accordance to Six-Sigma guidelines.

0 Likes
236 Views
stephanegrimaud
New Contributor II

Many thanks for this post Henric.

Do you know if it's possible to show all the points between the Upper Acceptance Limit and the Upper Outlier, please ? This is possible, for example, in the R boxplot :

Thanks in advance for your help.

Best regards,

Stéphane

0 Likes
236 Views
Employee
Employee

The box plot in Qlik Sense does this automatically. Try it!

The box plot in QlikView is not as flexible, but you can make it to show a second, or third outlier:

Image1.png

Do the following:

  1. Right-click the Upper Outlier in the list of measures. Choose Copy.
  2. Right-click in the list of measures. Choose paste.
  3. Insert a second parameter in the Max function
    Max( ... , 2 )
    Then you will get the second highest value - the second outlier.

HIC

0 Likes
236 Views
stephanegrimaud
New Contributor II

Oh ! Indeed, this is very nice ! Thanks a lot for this tip.

Well, I hope we will soon be able to work on Qlik Sense.

Thanks again Henric.

Stéphane

0 Likes
236 Views