Qlik Community

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Employee
Employee

Recipe for a Pareto Analysis – Revisited

“Which products contribute to the first 80% of our turnover?”

This type of question is common in all types of business intelligence. I say “type of question” since it appears in many different forms: Sometimes it concerns products, but it can just as well concern any dimension, e.g. customer, supplier, sales person, etc. Further, here the question was about turnover, but it can just as well be e.g. number of support cases, or number of defect deliveries, etc.

 

QV Bar chart.png

 

It is called Pareto analysis or ABC analysis and I have already written a blog post on this topic. However, in the previous post I only explained how to create a measure which showed the Pareto class. I never showed how to create a dimension based on a Pareto classification – simply because it wasn’t possible.

 

But now it is.

 

But first things first. The logic for a Pareto analysis is that you first sort the products according to their sales numbers, then accumulate the numbers, and finally calculate the accumulated measure as a percentage of the total. The products contributing to the first 80% are your best, your “A” products. The next 10% are your “B” products, and the last 10% are your “C” products. In the above graph, these classes are shown as colors on the bars.

 

The previous post shows how this can be done in a chart measure using the Above() function. However, if you use the same logic, but instead inside a sorted Aggr() function, you can achieve the same thing without relying on the chart sort order. The sorted Aggr() function is a fairly recent innovation, and you can read more about it here.

 

The sorting is needed to calculate the proper accumulated percentages, which will give you the Pareto classes. So if you want to classify your products, the new expression to use is

 

=Aggr(
    If(Rangesum(Above(Sum({1} Sales)/Sum({1} total Sales),1,RowNo()))<0.8, 'A',
        If(Rangesum(Above(Sum({1} Sales)/Sum({1} total Sales),1,RowNo()))<0.9, 'B',
            'C')),
    (Product,(=Sum({1} Sales),Desc))
    )

 

The first parameter of the Aggr() – the nested If()-functions – is in principle the same as the measure in the previous post. Look there for an explanation.

 

The second parameter of the Aggr(), the inner dimension, contains the magic of the sorted Aggr():

 

    (Product,(=Sum({1} Sales),Desc))

 

This structured parameter specifies that the field Product should be used as dimension, and its values should be sorted descending according to Sum({1} Sales). Note the equals sign. This is necessary if you want to sort by expression.

 

So the Products inside the Aggr() will be sorted descending, and for each Product the accumulated relative sales in percent will be calculated, which in turn is used to determine the Pareto classes.

 

The set analysis {1} is necessary if you want the classification to be independent of the made selection. Without it, the classification will change every time the selection changes. A perhaps better alternative is to use {$<Product=>}. Then a selection in Product (or in the Pareto class itself) will not affect the classification, but all other selections will.

 

This expression can be used either as dimension in a chart, or in a list box. Below I have used the Pareto class as first dimension in a pivot table.

 

QS Pivot.png

 

If you use this expression in a list box, you can directly select the Pareto class you want to look at.

 

QS List box.png

 

The other measures in the pivot table are the exclusive and inclusive accumulated relative sales, respectively. I.e. the lower and upper bounds of the product sales share:

 

Exclusive accumulated relative sales (lower bound):

=Min(Aggr(

    Rangesum(Above(Sum({1} Sales)/Sum({1} total Sales),1,RowNo())),
    (Product,(=Sum({1} Sales),Desc))

  ))

 

Inclusive accumulated relative sales (upper bound):

=Max(Aggr(

    Rangesum(Above(Sum({1} Sales)/Sum({1} total Sales),0,RowNo())),
    (Product,(=Sum({1} Sales),Desc))

  ))

 

Good luck in creating your Pareto dimension!

 

HIC

 

Further reading related to this topic:

The sortable Aggr function is finally here!

Recipe for a Pareto Analysis

Recipe for an ABC Analysis

46 Comments

Whoa! That's new isn't it? The help file still only mentions four Sort-types and Expression is not one of those for. So you can use an expression now to sort the aggr? That is awesome news.

So the number of products that generate 50% of the sales is:

Count(Distinct
Aggr(

    If(Rangesum(Above(Sum({1} Sales)/Sum({1} total Sales),1,RowNo()))<0.5, Product),

    (Product,(=Sum({1} Sales),Desc))

    )
)

Very nice!

Hmm, but not exactly something a casual business user will cobble together. Perhaps we can make that easier with a variable and passing some parameters. A native function would be even better.

ParetoCount(Dimension, SortExpression, SortOrder, Fractile, InclusiveOrExclusive)

646 Views
Employee
Employee

To answer the question on your formula: Yes, it does exactly that.

And, no, a business user will probably not manage this. But it is still a step in the right direction, I think.

HIC

646 Views

Definitely a great step forward! And it isn't even X-mas yet. A late Sinterklaas surprise then

646 Views
pablolabbe
Valued Contributor II

A hidden gem of the QIX Engine !

0 Likes
646 Views
Employee
Employee

Great!

A long-awaited feature

646 Views
MVP
MVP

Nice one! But why it is a QlikView chart?

646 Views
robert99
Valued Contributor II

Hi

What i tried to do was to have a dimension of customers (CSR) and a metric showing how many products made up 50% of the customers total sales

I started with this expression

Count({<DateType = {Invoice}, [Item Number] = >} DISTINCT

Aggr(

If(Rangesum(Above(Sum({<DateType = {Invoice}, [Item Number] = >}  [SalesUS$])

/  Sum({<DateType = {Invoice},[Item Number] = >} TOTAL  [SalesUS$]),1,RowNo()))<0.5,

[Item Number]),

  ([Item Number],(=Sum({<DateType = {Invoice}>}     [SalesUS$]),Desc))

))

This only gave the correct figure if I filtered by just one customer (CSR)

So I tried different options. Nothing worked. Like

Count({<DateType = {Invoice}, [Item Number] = >} DISTINCT TOTAL <CSR,[Item Number]>

Aggr(

If(Rangesum(Above(Sum({<DateType = {Invoice}, [Item Number] = >} TOTAL <CSR,[Item Number]>   [SalesUS$])

/  Sum({<DateType = {Invoice},[Item Number] = >} TOTAL <CSR>  [SalesUS$]),1,RowNo()))<0.5,

[Item Number]),

CSR , ([Item Number],(=Sum({<DateType = {Invoice}>} TOTAL <CSR , [Item Number]>   [SalesUS$]),Desc))

))

although filtering by just one customer always gave the correct total

I'm sure it can be done but my logic is a bit wrong

Thanks Robert

0 Likes
646 Views
Employee
Employee

I assume that you use this in a chart with Customer (CSR) as dimension. If so, the first expression will not work since CSR isn't used as dimension inside the Aggr().

In the second expression you have added CSR as inner dimension, but you have also added the Total qualifier in places where it shouldn't be used. You should have Total in the denominator inside the Aggr(), but not elsewhere.

HIC

0 Likes
646 Views
robert99
Valued Contributor II

Thanks Henric. Including for this blog post

What I was doing was just trying every possible option I could think of. My first serious attempt was this one

Count({<DateType = {Invoice}, [Item Number] = >} DISTINCT

Aggr(

If(Rangesum (Above(Sum({<DateType = {Invoice}, [Item Number] = >}  [SalesUS$])

/  Sum({<DateType = {Invoice},[Item Number] = >} TOTAL <CSR>   [SalesUS$]),1,RowNo()))<0.9,

[Item Number]),

CSR , ([Item Number],(=Sum({<DateType = {Invoice}>}    [SalesUS$]),Desc))

))

Its close but the figures are always either correct or a bit high. For example the first 4 lines should be 8,61,34,15 not 9,82,41,15. I will continue working on it tomorrow. A number of clients have asked for this sort of report and I had to do it in a two step process before (download then upload). But it will be great if I can work this out. 

First column = customer

Chart Great .90.JPG

0 Likes
646 Views
Employee
Employee

I just realised that this probably isn't possible after all.

You want to sort the Aggr() the following way:

  1. By CSR according to alphabet
  2. By Item according to Sum(Sales) desc

However, bullet two will always be sorted globally, i.e. there can only be one sort order for the dimension "Item". But you want different sort orders for different customers (CSR), and this is not possible.

HIC

646 Views
Not applicable

That's awesome news, HIC!

I was just working in a case similar to this. I tried the function and it worked perfectly! The only change I made was removing the 1  from the set analysis because I want my ABC to be dynamic according to my selections. Do you think this could be a problem?

Once again, thank you very much for sharing this!

0 Likes
646 Views
Employee
Employee

If you really want it to be dynamic, it should cause no problems. However, it will mean that the classification will change the moment you select one of the Pareto groups. So a second click on a Pareto class will reduce the data sample further. This will be confusing: "Now, how many times did I click on class A?"

An alternative that I think is better is if you use {$<Product=>} as set expression (assuming that the Pareto classification is made on the field Product). Then you will have a dynamic classification that is based on all fields except the Product.

HIC

646 Views
Not applicable

That's is a concern and I believe it won't be possible to include the Pareto classifications into the set expression haha.

The reason I mentioned the need to make the analysis dynamic is, for example, when I have sales representatives and I want to analyze each one individually, i.e., each sales representative will have a different set of clients in the Pareto analysis and therefore different classifications.

Stefano Draghi

646 Views

This is amazing and would come in very very handy. I have people waiting for a solution for this exact problem who had to look into dirty workarounds. I have already guided them to take a look at this new functionality.

Thanks for sharing,

Sunny

646 Views

Here is an example which was not something not doable before, but now it is

Re: ABC Analysis in Qlikview

0 Likes
646 Views
pschmidt1973
New Contributor II

Notice in Qlik Sense that the Pareto Classes do not appear in the drop-down.

Pareto.PNG

Is this a bug?

0 Likes
646 Views
Employee
Employee

Looks like it... Thanks for pointing it out. We will investigate.

0 Likes
646 Views
rva_heldendaten
Contributor III

I'm just playing around with this feature, and it seems it solves another old Qlik limitation for me.

Maybe I'm still in Chistmas mood, and have overlooked something, but the expression

=(aggr(dual(only(Artikelname),RowNo(total)) ,Firma,([Artikel-Nr],sum(Sales),DESC)))

allows me to sort a Pivottable to always show the Top Product per Company.

Before that, the sort was always globally, and therefore the product "Cote de Blaye" would always be the first line for each Company, as it is overall the most sold product.

sortable_Aggr.png

The way to get there is a little bit cumbersome (I first have to create a straight table, change the sort order of dimensions, then change it back to a pivot table). But beside that it works on my dataset in QlikView 12.10 SR1.

It would be nice if someone can test this with another dataset.

I will post a video how I create the pivot table in a minute!

646 Views
rva_heldendaten
Contributor III

Here how I created the pivottable

0 Likes
646 Views
rva_heldendaten
Contributor III

And here the .qvw file:

content.heldendaten.eu/Nordwind_SortableAggr.zip

0 Likes
646 Views
MVP
MVP

I see no difference in sorting when just using [Artikelname] as 2nd dimension..

0 Likes
646 Views
rva_heldendaten
Contributor III

Ralf you are right! I simplified my issue so much, that it works anyway

What I really tried to achieve is to:

- Sort YEAR numerically

- Sort Month numerically

- Sort Articels by Top-Article in each month.

I hoped that the new Aggr-feature would help, but I failed so far..

2016-12-29 08_50_26-QlikView x64 - Testversion - [C__Users_rva_Desktop_Nordwind_SortableAggr.qvw_].png

0 Likes
646 Views
rva_heldendaten
Contributor III

The best way I know to do so, ist something like

=aggr(dual(only(Artikelname),ceil((max(Jahr)*1000000000)+ (max(Monat)* 10000000)+Sum(Sales)*1000)+RowNo(total) ) ,Jahr, Monat,[Artikel-Nr])

I really don't like this approach, because:

- Encoding the Year+Month into the DUAL()-Expression is dangerous when "Sales" is growing. One needs to ensure you have enough Zeroes are set for Year and Monath, so the numeric sorting still works.

- If one article has exactly the same Sales in this Month, the Dual-Value would be eaxtly the same. This is the reason I also encode Rowno() into the Dual-Expression.

If the new AGGR-Sort could help here in some way, it would be great. Maybe someone finds an expression!

2016-12-29 09_32_59-QlikView x64 - Testversion - [C__Users_rva_Desktop_Nordwind_SortableAggr.qvw].png

0 Likes
646 Views
MVP
MVP

Roland, I still do not understand the issue. In v12.10 SR1 sorting by measure (y-value) in pivot seems to work fine even with three dimensions. Probably better to open a new thread.

0 Likes
646 Views
rva_heldendaten
Contributor III
0 Likes
646 Views
karthiksrqv
Contributor II

Thanks!

0 Likes
646 Views
luismarianoante
New Contributor

Gracias Me fue muy útil el ejemplo

0 Likes
646 Views
Not applicable

Hi Robert,

I am also facing the same issue. Have you been able to find a solution for this?

Thanks

Aditya

0 Likes
646 Views
Not applicable

Hi Henric,

I am also facing same issue as Robert. Here is my expression to create a dimension:

=Aggr(

If(Rangesum(Above(Sum({<open={'Closed'}>}Amount)

    /Sum({<open={'Closed'}>} Total <OTCRange> Amount),1,RowNo()))<=0.8, 'A',

     

     

        If(Rangesum(Above(Sum({<open={'Closed'}>}Amount)

        /Sum({<open={'Closed'}>} Total <OTCRange> Amount),1,RowNo()))<=0.95, 'B',

            'C')),

    (RecordID,(=Sum({<open={'Closed'}>}Amount),Desc))

    )

I have used [OTCRange] as a column and Sum(Amount) as a measure. It gives correct result for a selected OTC range. eg. Suppose [OTCRange] = 'OnTime' , '1 to 5' etc. If i select either of OnTime or '1 to 5' then it gives correct results but if i select  both then it does not work.

Thanks

Aditya

0 Likes
646 Views
Not applicable

Below expression worked for me:

=Aggr(

If(Rangesum(Above(Sum({<open={'Closed'}>}Amount)

    /Sum({<open={'Closed'}>} Total <OTCRange> Amount),1,RowNo()))<=0.8, 'A',

    

    

        If(Rangesum(Above(Sum({<open={'Closed'}>}Amount)

        /Sum({<open={'Closed'}>} Total <OTCRange> Amount),1,RowNo()))<=0.95, 'B',

            'C')),

   OTCRange, (RecordID,(=Sum({<open={'Closed'}>}Amount),Desc))

    )

Thanks

Aditya

0 Likes
646 Views