Qlik Community

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Buckets

Often when creating a QlikView application, you want to add some grouping of a number, and then use this as a dimension in a chart or as a field where you make selections.

Usually, the number in itself is not interesting, but the rough value is interesting as attribute. It could be that you group people into age groups: Children, Adults and Seniors. Or you want to classify shipments to or from your company in how delayed they are: Too early, Just in time or Delayed.

These groups are often called buckets.

Delays.png

 

The most straightforward way to create buckets, is to use multiple nested if() functions, e.g:

   If( ShippedDate - RequiredDate <= -5, 'Too early',
   If( ShippedDate - RequiredDate <= 0, 'Just in time',
   If( ShippedDate - RequiredDate <= 5, 'Small delay',
      'Large delay' ))) as Delay,

Or if you use dual values:

   If( ShippedDate - RequiredDate <= -5, Dual( 'Too early', -5 ),
   If( ShippedDate - RequiredDate <= 0, Dual( 'Just in time', 0 ),
   If( ShippedDate - RequiredDate <= 5, Dual( 'Small delay', 5 ),
      Dual( 'Large delay', 10 )))) as Delay,

However, if you have many classes, the above statements are neither pretty nor manageable. Then it might be better to use a rounding function or the Class() function:

   Round( ShippedDate - RequiredDate , 5 ) as Delay,

   Class( ShippedDate - RequiredDate , 5 ) as Delay,

A third option is to use IntervalMatch:

   DelayClasses:
   Load Lower, Upper, Delay Inline
   [Lower,  Upper,  Delay
    -E99,  -5,  Too early
   -4,          0,  Just in time
   1,           5,  Small delay
   6,      E99,  Large delay];

   IntervalMatch (DelayInDays)
   Load Lower, Upper Resident DelayClasses;

The above three methods all create a field Delay already in the script, and this is what you should do if you have a static definition of the grouping.

However, there are cases where you may want a dynamic definition, and then you need to create a calculated dimension using the Aggr() function. Say, for example, that you want to assess the reliability of your suppliers – but since this is something that varies over time and location, you want to make the classification after you have made the appropriate selections. This you cannot make in the script.

But you should still calculate the necessary static fields in the script, i.e. in this case the delay of a shipment, e.g. by

   ShippedDate - RequiredDate as DelayInDays,

One way to define the reliability is to measure how many percent of the deliveries that were on time, classified into percent intervals.

Supplier reliability.png

 

In the above chart, the following expression was used as dimension:

   =Aggr(Num(Round(Count(If(DelayInDays<=0,ShipmentID))/Count(ShipmentID ),0.1),'0%' ), Supplier)

The Aggr() function creates an array of values – one value per supplier: For each supplier, the number of “good” shipments are counted and divided by the total number of shipments. The number is rounded to nearest 10% to create the buckets and finally the Num() function formats the number as a percentage.

You can also rank the suppliers and bucket them in quartiles:

Supplier reliability quartiles.png

 

In the above chart, the following expression was used as dimension:

   =Aggr(
          Pick(
              Ceil(
                  4*Rank(Count(If(DelayInDays<=0, ShipmentID))/Count(ShipmentID),4)
                  /
                 Count(distinct total Supplier)
                  ),
              '1st quartile','2nd quartile','3rd quartile','Bottom quartile'
              ),
         Supplier
         )

By clicking on a bar in either of these charts, you will select the corresponding suppliers.

Bottom line: Create buckets in all cases where a classification helps the user to get a better overview of data.

HIC

 

PS This is my 100th blog post. If you want to read previous posts, click my initials above.

Further reading related to this topic:

Calculated Dimensions

Recipe for an ABC Analysis

30 Comments
Not applicable

As Usual Helpful post. Cheers..

0 Likes
3,600 Views
datanibbler
Esteemed Contributor

Hi,

yep! This is really good reading.

Also, such a "bucketing" of data is often requested by management to get a quick overview of the state of affairs in a certain respect.

I just realized there is already an RSS_feed available. Great!

0 Likes
3,600 Views
thornofcrowns
Valued Contributor II

Excellent post, makes things a lot clearer.

0 Likes
3,600 Views
Not applicable

Thanks for the post. this is more tactful way to handle bucket scenarios.

0 Likes
3,600 Views
male_carrasco
Contributor

Great Post!!!

3,600 Views
dfoster9
Valued Contributor

I like the dynamic bucketing expression (even if it gave me a slight headache ). One trap to be careful of dynamic bucketing (although not percentile ones like these) and functions like CLASS is that they can hide empty buckets making them visually misleading. I tend towards using IntervalMatch based buckets as they will show empty buckets (assuming you have turned on 'show zero values').

3,600 Views

If you use Round() or some other function that returns a numeric value, then you can just choose to display your X-axis as a 'Continuous' axis, and then you will see your empty buckets.

HIC

3,600 Views
dfoster9
Valued Contributor

that is what I meant by (although not percentile ones like these) . Should have said (although not 'numeric' ones like these).


Could the quartile expression hide a bucket?

0 Likes
3,600 Views
sudeepkm
Valued Contributor III

very nice..

0 Likes
3,600 Views
Not applicable

Thanks!

0 Likes
3,600 Views
Sajid_Mahmood
Contributor

Nice post

0 Likes
3,600 Views
Not applicable

Congratulations on reaching 100.

0 Likes
3,600 Views
Not applicable

Henric, congratulations.  I enjoy reading your posts and find them very educational.  Keep up the good work.

0 Likes
3,600 Views
joaquinlr
Valued Contributor II

Thank you HIC.

Soon I will need it

0 Likes
3,600 Views
Not applicable

thank you Henric,

this is really helpful and inspiring...

I like it.


0 Likes
3,600 Views
MVP
MVP

Thanks HIC... Love to read your post and always eagerly waiting for the same...Your suggested methods are so simple and straight forward but really helpful in challenging environment..

0 Likes
3,600 Views
Not applicable

Thanks for the Great Post.

0 Likes
3,600 Views
Partner
Partner

Interesting..Thanks for the Post

0 Likes
3,600 Views

Nicely explain the best use of Bucket as usual.

0 Likes
3,600 Views
Partner
Partner

Excellent post as usual.

I have a question for you, if you can...

I don't understand so well the construction of statistic buckets. I see your sample and i suppose that my expression can be wrong.

I use this aggr to create my statistic bucket

=if(aggr(SUM(Value),[Card_Id])<=Fractile(TOTAL aggr(SUM(Value),[Card_Id]),0.1),1,

if(aggr(SUM(Value),[Card_Id])<=Fractile(TOTAL aggr(SUM( Value),[Card_Id]),0.2),2,

if(aggr(SUM(Value),[Card_Id])<=Fractile(TOTAL aggr(SUM( Value),[Card_Id]),0.3),3,

if(aggr(SUM(Value),[Card_Id])<=Fractile(TOTAL aggr(SUM( Value),[Card_Id]),0.4),4,

if(aggr(SUM(Value),[Card_Id])<=Fractile(TOTAL aggr(SUM( Value),[Card_Id]),0.5),5,

if(aggr(SUM(Value),[Card_Id])<=Fractile(TOTAL aggr(SUM( Value),[Card_Id]),0.6),6,

if(aggr(SUM(Value),[Card_Id])<=Fractile(TOTAL aggr(SUM( Value),[Card_Id]),0.7),7,

if(aggr(SUM(Value),[Card_Id])<=Fractile(TOTAL aggr(SUM( Value),[Card_Id]),0.8),8,

if(aggr(SUM(Value),[Card_Id])<=Fractile(TOTAL aggr(SUM( Value),[Card_Id]),0.9),9,

10)))))))))

Do you think is it possible  with my example to create a bucket as you did?

Thank you

0 Likes
3,600 Views
dfoster9
Valued Contributor

Wouldn't that be easier using

AGGR(CEILING((SUM(VALUE)/SUM(TOTAL VALUE))*10),[Card_Id])

I.e. derive the ratio of VALUE to TOTAL VALUE for each Card_Id, times the outcome by 10 (0.0 to 9.9) and then round up the nearest whole number.

3,600 Views
Not applicable

Great post HIC and Congrats for 100th blog post!

I have one question:

Can QlikView allow users to create buckets on fly in browser? For e.g. Mapping a product to a region on fly in browser and see the numbers accordingly.

Any inputs will be greatly appreciated.

Thanks

Ram

0 Likes
3,600 Views
Partner
Partner

Clear, concise and helpful as always

regards

0 Likes
3,600 Views
Not applicable

Excelent, Henric.

This post is useful because:

  • its writing is clear;
  • the example is easily applied;
  • it brings several ways to be compared.

Thanks a lot for your effort.

...

Ricardo Ildefonso
Londrina, Brasil

0 Likes
3,600 Views
qlikviewnovice
Valued Contributor II

Excellent Henric!!

ThanQ so Much for sharing

3,600 Views
Not applicable

Great

0 Likes
3,600 Views
meetmrudul
Contributor II

Can I have a bucket on Y-axis, as in for expression

0 Likes
3,600 Views

Yes, you can. Try

  Round( <Measure> , 100 )

or

  Class( <Measure> , 100 )

or a nested if()-function.

See also

Recipe for a Pareto Analysis

Recipe for an ABC Analysis

HIC

0 Likes
3,600 Views
beck_bakytbek
Honored Contributor

thanks for sharing of this useful function

Thanks a lot

0 Likes
3,600 Views
ngrunoz
New Contributor II

Great Stuff.

I have a Scenario That I am not sure how best to approach it which requires Buckets. The column with the data I  want to create Buckets for has Multiple Data Types. Creating Buckets on Multiple Data Type

Please may you assist on how I can Create Buckets Dependent upon Each Question.

0 Likes
3,600 Views