Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
hic
Former Employee
Former Employee

In quality control, you often want to look at the distribution of a measurement, to understand how the output of a process or a machine relates to expectations; to targets and specifications. In such a case, a histogram (or frequency plot) is one possibility.

It could be that you want to examine some physical property of the output of a machine, and want to see how close to target the produced units are. Then you could plot the measurements in a chart like the following:

Histogram.png

The above graph clearly shows you the distribution of the output of the machine: Most measurements are around target and the peak of the distribution is in fact slightly above target. But the histogram also raises questions: Is the variation small enough? And why is there such a long tail towards lower values? Could it be that we have a problem with a machine?

Finding such questions and their answers is central in all quality work, and the histogram is a good tool in helping you find them.

A histogram is special type of bar chart, and is easy to create in QlikView. A peculiarity is that it uses only one field, not several: As dimension, it uses the measurement in grouped form: Each measurement is assigned to an interval or bin, and this way the dimension gets discrete values.

As expression it uses the count of the measurement, and so the graph shows the distribution of one single field.

One small challenge is to determine how many bins the histogram should have: Having too many bins will exaggerate the variation, whereas too few will obscure it. A simple rule of thumb is to have 10-15 bins.

This is how you create a histogram in QlikView:

  1. Create an Input Box. In its properties, create a new variable called BinWidth. Click OK.
  2. Set BinWidth to 1 in the Input Box.
  3. Create a Bar Chart with a calculated dimension, using =Round(Value, BinWidth)
  4. Set the label for the calculated dimension to “Measurement”. Click Next.
  5. Use Count(Value) as expression. Click Next.
  6. Sort the calculated dimension numerically. Click Next three times.
  7. On the “Axes” page, enable “Continuous” on the Dimension Axis. Click Next.
  8. On the “Colors” page, disable the “Multicolored” under Data appearance. Click Finish.

Input box.png

You should now have a histogram.

If you have too few bars, you need to make the bin width smaller. If you have too many, you should make it bigger.

In order to make the histogram more elaborate you can also do the following:

  • Add error bars to the bins. The error (uncertainty) of a bar is in this case the square root of the bar content, i.e. Sqrt(Count(Value))
  • Add a second expression containing a Gaussian curve (bell curve):
    • Convert the chart to a Combo chart
    • Use the following as expression for the bell curve:
      Only(Normdist(Round(Value,BinWidth),Avg(total Value),Stdev(total Value), 0))*BinWidth*Count(total Value)
    • Use bars for the measurement and line for the curve.

Histogram2.png

With these changes, you can quickly assess whether the measurements are normally distributed or whether there are some anomalies.

Good luck!

HIC

Further reading related to data classification:

Recipe for a Box Plot

Recipe for a Pareto Analysis

Buckets

38 Comments
Not applicable

Thanks a lot Henric,

I have waited for this blog.

0 Likes
14,585 Views
fabio_vallone
Creator
Creator

Great post, thanks!

0 Likes
14,585 Views
male_carrasco
Creator
Creator

HIC,

You're the best!!! 

0 Likes
14,585 Views
Not applicable

After all the great thanks its  time for a question, if I may


How to create a bin dimension that is calculated from aggregation (TOTAL or AGGR)?


     1.  I am producing many different parts, each part has many work orders, the fact table includes reported           production time for each work order. I want to produce Histogram for all parts reported production time.

     2.  In order to do that , I need  to normalize all the data , in the fact table, according to this formula :

               (specific reported production time for part i - average of reported production time for part i)/

                 standard deviation of reported production time for part i

     3.  When I calculate the formula in the expression tab I get results,but in order to build Histogram I need to           insert the formula in the dimension tab , under CLASS function, but then I get no results.

Any suggestions ?

0 Likes
14,585 Views
christian77
Partner - Specialist
Partner - Specialist

Great!

How you paint those whiskers in the error lines?

Why error is the sq root of the value?

Could you attach the example?

Christian.

0 Likes
14,585 Views
Anonymous
Not applicable

your blogs are always very inspiring ang and provide futuristic solutions.

Thanks for investing your time in such R&D. It saves a lot of time for others and provides a way to think beyond conventional way.

Regards,

Nita Prasad

0 Likes
14,585 Views
Not applicable

Hi Henric,

I try to do this example with my data but I have a probem, Could you like a attachement the qlik document??

Thanks a lot for advanced,

Mik.

0 Likes
12,112 Views
hic
Former Employee
Former Employee

Chris Conejero

For Error Bars, just enter a formula under "Error Symmetric"

Error bars.png

Why the error is the square root? The distribution over the bins may be a normal distribution (or any distribution for that matter) but the number in a bin is a Poisson distributed number. I.e. if the normal distribution predicts that you should have 2.5 entries in a specific bin, you may get 2 or 3 (or 1 or 4) in real life. This integer is a Poisson distributed number. Further, for Poisson distributed numbers the variance (square of stdev) is always the same as the mean, so the best estimate of the error is the square root of the obtained integer number. See more in a text book on statistics (or on the web, e.g. http://www.pha.jhu.edu/~c173_608/doc/lecture4.pdf)

HIC

0 Likes
12,112 Views
christian77
Partner - Specialist
Partner - Specialist

Hi Henric.

Thank you for that. I'll check that out.

Chris.

0 Likes
12,112 Views
Anonymous
Not applicable

Henric,

Again thank you for all the contributions you have made. 

I followed your formula and I am almost there but the Gaussian curve is not doing as I would expect.  Much lower values. 

I have changed it a bit in the way of not a single value as I want to see the number of agents in the revenue buckets.

Dimension:

=round(aggr(sum(Revenue), %ADAcctKey, TradMth), BinWidth)

Expressions:

Bar:   sum(ADAcctCounter)

Line:  Only(Normdist(round(aggr(sum(Revenue), %ADAcctKey, TradMth),BinWidth), Avg(total aggr(sum(Revenue), %ADAcctKey, TradMth)), avg(total aggr(sum(Revenue), %ADAcctKey, TradMth)), 0))*BinWidth*sum(total ADAcctCounter)

Side note: In Axes I have reduced the scale static max to limit the low outliers but wonder why the error lines show for those.  Not a big deal but curious.  Gaussian is the bigger issue.

Thanks for your help.

0 Likes
12,112 Views