Skip to main content
hic
Former Employee
Former Employee

Often when creating a QlikView application, you want to add some grouping of a number, and then use this as a dimension in a chart or as a field where you make selections.

Usually, the number in itself is not interesting, but the rough value is interesting as attribute. It could be that you group people into age groups: Children, Adults and Seniors. Or you want to classify shipments to or from your company in how delayed they are: Too early, Just in time or Delayed.

These groups are often called buckets.

Delays.png

 

The most straightforward way to create buckets, is to use multiple nested if() functions, e.g:

   If( ShippedDate - RequiredDate <= -5, 'Too early',
   If( ShippedDate - RequiredDate <= 0, 'Just in time',
   If( ShippedDate - RequiredDate <= 5, 'Small delay',
      'Large delay' ))) as Delay,

Or if you use dual values:

   If( ShippedDate - RequiredDate <= -5, Dual( 'Too early', -5 ),
   If( ShippedDate - RequiredDate <= 0, Dual( 'Just in time', 0 ),
   If( ShippedDate - RequiredDate <= 5, Dual( 'Small delay', 5 ),
      Dual( 'Large delay', 10 )))) as Delay,

However, if you have many classes, the above statements are neither pretty nor manageable. Then it might be better to use a rounding function or the Class() function:

   Round( ShippedDate - RequiredDate , 5 ) as Delay,

   Class( ShippedDate - RequiredDate , 5 ) as Delay,

A third option is to use IntervalMatch:

   DelayClasses:
   Load Lower, Upper, Delay Inline
   [Lower,  Upper,  Delay
    -E99,  -5,  Too early
   -4,          0,  Just in time
   1,           5,  Small delay
   6,      E99,  Large delay];

   IntervalMatch (DelayInDays)
   Load Lower, Upper Resident DelayClasses;

The above three methods all create a field Delay already in the script, and this is what you should do if you have a static definition of the grouping.

However, there are cases where you may want a dynamic definition, and then you need to create a calculated dimension using the Aggr() function. Say, for example, that you want to assess the reliability of your suppliers – but since this is something that varies over time and location, you want to make the classification after you have made the appropriate selections. This you cannot make in the script.

But you should still calculate the necessary static fields in the script, i.e. in this case the delay of a shipment, e.g. by

   ShippedDate - RequiredDate as DelayInDays,

One way to define the reliability is to measure how many percent of the deliveries that were on time, classified into percent intervals.

Supplier reliability.png

 

In the above chart, the following expression was used as dimension:

   =Aggr(Num(Round(Count(If(DelayInDays<=0,ShipmentID))/Count(ShipmentID ),0.1),'0%' ), Supplier)

The Aggr() function creates an array of values – one value per supplier: For each supplier, the number of “good” shipments are counted and divided by the total number of shipments. The number is rounded to nearest 10% to create the buckets and finally the Num() function formats the number as a percentage.

You can also rank the suppliers and bucket them in quartiles:

Supplier reliability quartiles.png

 

In the above chart, the following expression was used as dimension:

   =Aggr(
          Pick(
              Ceil(
                  4*Rank(Count(If(DelayInDays<=0, ShipmentID))/Count(ShipmentID),4)
                  /
                 Count(distinct total Supplier)
                  ),
              '1st quartile','2nd quartile','3rd quartile','Bottom quartile'
              ),
         Supplier
         )

By clicking on a bar in either of these charts, you will select the corresponding suppliers.

Bottom line: Create buckets in all cases where a classification helps the user to get a better overview of data.

HIC

 

PS This is my 100th blog post. If you want to read previous posts, click my initials above.

Further reading related to this topic:

Calculated Dimensions

Recipe for an ABC Analysis

31 Comments
DavidFoster1
Specialist
Specialist

Wouldn't that be easier using

AGGR(CEILING((SUM(VALUE)/SUM(TOTAL VALUE))*10),[Card_Id])

I.e. derive the ratio of VALUE to TOTAL VALUE for each Card_Id, times the outcome by 10 (0.0 to 9.9) and then round up the nearest whole number.

6,575 Views
Not applicable

Great post HIC and Congrats for 100th blog post!

I have one question:

Can QlikView allow users to create buckets on fly in browser? For e.g. Mapping a product to a region on fly in browser and see the numbers accordingly.

Any inputs will be greatly appreciated.

Thanks

Ram

0 Likes
6,575 Views
jaimeaguilar
Partner - Specialist II
Partner - Specialist II

Clear, concise and helpful as always

regards

0 Likes
6,575 Views
Not applicable

Excelent, Henric.

This post is useful because:

  • its writing is clear;
  • the example is easily applied;
  • it brings several ways to be compared.

Thanks a lot for your effort.

...

Ricardo Ildefonso
Londrina, Brasil

0 Likes
6,575 Views
Anonymous
Not applicable

Excellent Henric!!

ThanQ so Much for sharing

6,576 Views
Not applicable

Great

0 Likes
6,576 Views
Anonymous
Not applicable

Can I have a bucket on Y-axis, as in for expression

0 Likes
6,576 Views
hic
Former Employee
Former Employee

Yes, you can. Try

  Round( <Measure> , 100 )

or

  Class( <Measure> , 100 )

or a nested if()-function.

See also

Recipe for a Pareto Analysis

Recipe for an ABC Analysis

HIC

0 Likes
6,576 Views
beck_bakytbek
Master
Master

thanks for sharing of this useful function

Thanks a lot

0 Likes
6,576 Views
ngrunoz
Contributor II
Contributor II

Great Stuff.

I have a Scenario That I am not sure how best to approach it which requires Buckets. The column with the data I  want to create Buckets for has Multiple Data Types. Creating Buckets on Multiple Data Type

Please may you assist on how I can Create Buckets Dependent upon Each Question.

0 Likes
6,576 Views