All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.
When you want to look at the distribution of a measurement, a histogram is one possibility. However, if you want to show the distribution split over several dimensional values, a Box Plot may be a better choice.
You may, for instance, want to evaluate the quality of units produced in different machines, or delivered by different suppliers. Then, a Box Plot is an excellent choice to display the characteristic that you want to examine:
The graph clearly shows you the performance of the different machines compared to target: Machine A has the precision, but not the accuracy. Machine F has the accuracy, but not the precision.
The Box Plot provides an intuitive graphical representation of several properties of the data set. The box itself represents the main group of measurements, with a center line representing the middle of the data. Usually the median and the upper and lower quartile levels are used to define the box, but it is also possible to use the average plus/minus one standard deviation.
The whiskers are used to show the spread of the data, e.g. the largest and smallest measurements can be used. Usually, however, the definition is slightly more intricate. Below I will use the definition used in six sigma implementations.
There, the whiskers are often used to depict the largest and smallest values within an acceptable range, whereas values outside this range are outliers.
The concept of the Inter Quartile Range (IQR) – the difference between the upper and lower quartile level – is used to calculate the acceptance range. Hence:
The picture below summarizes the box plot.
And here is how you implement this in QlikView…
QlikView has now created a Box Plot with general expressions that almost always display a meaningful result, and allows for an intermediate aggregator. However, the expressions are not what we want for a six sigma box plot, so we need to change them to the following: (Below, the dimension is called Dim, and the measurement is called Val.)
The whiskers and the outliers all need a nested aggregation – each value needs to be compared to the acceptance levels for the group – so they all contain an Aggr() function that calculates the relevant acceptance limit:
And with this, I leave you to create your own box plots.
Further reading related to data classification:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.