# Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Employee

## Recipe for a Pareto Analysis

“Which products contribute to the first 80% of our turnover?”

This type of question is common in all types of business intelligence. I say “type of question” since it appears in many different forms: Sometimes it concerns products, but it can just as well concern customers, suppliers or sales people. It can really be any dimension. Further, here the question was about turnover, but it can just as well be number of support cases, or number of defect deliveries, etc. It can in principle be any additive measure.

It is called Pareto analysis. Sometimes also known as 80/20 analysis or ABC analysis.

The logic is that you first sort the products according to size, then accumulate the numbers, and finally calculate the accumulated measure as a percentage of the total. The products contributing to the first 80% are your best products; your “A” products. The next 10% are your “B” products, and the last 10% are your “C” products.

And here’s how you do it in QlikView:

1. Create a pivot table and choose your dimension and your basic measure. In my example, I use Product and Sum(Sales).

2. Sort the chart descending by using the measure Sum(Sales) as sort expression. It is not enough just to check “Sort by Y-value”.

3. Add a second expression to calculate the accumulated sales value:
RangeSum(Above(Sum(Sales), 0, RowNo()))
Call this expression Accumulated Sales. The Above() function will return an array of values – all above values in the chart – and the RangeSum() function will sum these numbers.

4. Create a third expression from the previous one; one that calculates the accumulated sales in percent:
RangeSum(Above(Sum(Sales), 0, RowNo())) / Sum(total Sales)
Format it as a percentage and call it Inclusive Percentage.

5. Create a fourth expression from the previous one; one that calculates the accumulated sales in percent, but this time excluding the current row:
RangeSum(Above(Sum(Sales), 1, RowNo())) / Sum(total Sales)
Format it as a percentage and call it Exclusive Percentage.

6. Create a fifth expression for the ABC classification:
If([Exclusive Percentage] <= 0.8, 'A', If([Exclusive Percentage] <= 0.9, 'B', 'C'))
Call this expression Pareto Class. The reason why the Exclusive Percentage is used, is that the classification should be determined by the lower bound of a product’s segment, not the upper.

7. Create a conditional background color, e.g.
If([Pareto Class] = 'C', LightRed(), If([Pareto Class] = 'B', Yellow()))

You should now have a table similar to the following. In it you can clearly see the classification of different products.

In this table, there are five different expressions that you can use for Pareto analysis. The graph in the beginning of this post uses Sales and Inclusive Percentage for the bars and the line, respectively; and Pareto Class for the coloring of the bars.

Further, you may want to combine the Pareto Class and the Exclusive Percentage into one expression:

Pareto Class =
If(RangeSum(Above(Sum(Sales),1,RowNo())) / Sum(total Sales) <= 0.8, 'A',
If(RangeSum(Above(Sum(Sales),1,RowNo())) / Sum(total Sales) <= 0.9, 'B', 'C'))

Good luck in creating your Pareto chart.

HIC

Further reading related to this topic:

Recipe for an ABC Analysis

Recipe for a Pareto Analysis – Revisited

Partner

Hi Henric,

Very useful post. I want to display the count of products and their names in a text box that have contributed to first 25% of the turnover. Is it possible to do it outside of a chart? The pre-requisite is to sort the sum(Sales) in descending order. How does one do that in a single expression and then do a cumulative sum?

Would look forward to your response.

Thanks and Regards

330 Views
Employee

No, unfortunately this is today not possible. But we are looking at possibilities to do this in future versions. Basically, you would need an Aggr() function with a sorted output vector. And today Aggr() cannot be sorted.

What you could do, is to use an ABC-categorization based on rank rather than Pareto.

HIC

330 Views
Not applicable

Very useful and easy to make.

Thanks a lot

330 Views
Valued Contributor III

Hi Henric,

I have a similar requirement where I have input box which holds some dimensions and a straight table with out dimension but with some expressions, like

1. Total spend

2. 80% spend(Top Spend)

3. 20% spend (Tail Spend)

4. Top count(80% spend, count of a selected dimension in the input box which are involved in 80% spend)

5. Tail count(20% spend ,count of a selected dimension in the input box which are involved in 20% spend)

for the fourth column the customer want to see count of selected dimension(i.e. product) involved in 80% spend like wise, similarly fifth count of 20% spend dimensions.

I used expression for 4 as

-Sum(aggr(rangesum(top(sum(Spend),1,rowno()-1))/rangesum(top(sum(Spend),1,noofrows())),

\$(SpendDriverDimension))<0.8)

but this expression is not working fine with some of the input box values(dimensions), the variable which I am using in the input box is \$(SpendDriverDimension).

I think that top() functions cannot be used with out dimensions, please suggest any alternatives.

Thanks

Kiran Kumar

330 Views
Partner

Hi Kiran,

Just took a glance at your expression, here are some tangential pointers that may (or may not) help - no solution, just a bit of background information on what is going on in your current expression :

The Top function is evaluating the expression Sum(Spend) as per the first record in your aggregate dimension listing. It is dependent on the sort order of your aggregate. The Aggr() function does not contain a 'SORT BY' clause as per HIC's comment above and it is definitely not affected by the outer sort order set in your chart. So what exactly does the Top() function evaluate across in your instance? It's actually dependent on the LOAD ORDER of each of your dynamically referenced dimensions. I.e. if at time of refresh, you load your \$(SpendDriverDimension) with attributes orders sorted A-Z, then Top() would evaluate the record relating to your first alphabetically sorted dimension value. This is unlikely going to meet your requirement.

The lack of dynamic sort order in the Aggr() function is a significant restriction in achieving these type of result sets. HIC - can we get some form of explicit commitment on sort order in Aggr() statement being on the road map for a future version? I first raised an ideas submission for this back in 2010. Anyone wishing to up the visibility of this feature, follow this link and vote it up

This is one of my most commonly encountered restrictions when building analytical client solutions...

My two pennies for what it's worth.

Regards

Jonas

330 Views
Valued Contributor III

Hi Jonas/HIC,

Is there a way that we can achieve it or it depends on the sort capability of aggr().

Thanks

Kiran Kumar

330 Views
Employee

@ kiran

No, I don't know of any way to achieve this if you want to base your classification on a sorted, accumulated number. But you can do it if you base it on the ranking.

@ Jonas

No, you cannot gen an explicit commitment from Qlik. Qlik is a public company, and as such we are not allowed to spread information about our plans that way. But I can promise you that I, personally, will work for this.

Concerning the technical solution: A sorted Aggr() would do the job. But a better alternative would perhaps be to introduce a function ParetoRank() that returns the number you want. Compare with Rank() that returns an integer 1..N, depending on where this Customer/Product is ranked: ParetoRank() would be similar, but the number would be a float between 0 and 1; like a percentage. In both cases, the best Customer/Product would have the lowest value and the worst would have the highest.

HIC

330 Views
Valued Contributor III

330 Views
Not applicable

How do I use more than one Dimension? I need to add Code Product and Segment Product in the table.

What do I need change in these expressions?

Tks.

330 Views
Not applicable

Thx! Great explanation!

330 Views
Not applicable

hello, i don't find the place to set the color to paretoclass

330 Views
Employee

Chart properties - Expressions - The small plus-sign to the left next to the expression - Background color

HIC

330 Views
Contributor III

why the result of this formula and 80-20 in dimension limit are different?

330 Views
Partner

If it's the difference between inclusive and exclusive percentage you are asking about it is because you want to include the bar that goes beyond 80% when doing the coloring.

The colored bars are those that precisely passes 80%.

You can't just compare with 80%, that would exclude the last bar that passes 80%.

330 Views
MVP

Useful and enlightening as usual.

Thanks

330 Views
Contributor III

Thank you for this information! I did it, it works. But when you have the day of sale 30 000 positions and statistics for the 5 years prior to the date ... In general, even for a week the server just stops, since the calculation is carried out in a loop in lines. If, for example take a simple table as an option accumulation, all he said instantly. But it also limits. What will advise how to implement the ABC analysis with giant data?

330 Views
Not applicable

Hi Henric,

Great posts you are providing. I would like something a bit more than the basic Pareto. I would like to compare orderline pareto (or frequency pareto) with volume pareto (or quantity pareto).

Either in a scatter chart or pivot this is what I try to accomplish;

 A Volume B Volume Volume C A Line Count 12 12 40 B Line Count 23 50 100 C Line Count 15 36 200

Where the numbers a SKU (or product, item anything counts) and the categories are defined as you explained;

=If(RangeSum(Above(sum([Line Volume (l)]),1,RowNo())) / sum(total [Line Volume (l)]) <= 0.8, 'A',

If(RangeSum(Above(sum([Line Volume (l)]),1,RowNo())) / sum(total [Line Volume (l)]) <= 0.9, 'B', 'C'))

=If(RangeSum(Above(Count(Order_Number),1,RowNo())) / Count(total Order_Number) <= 0.8, 'A',

If(RangeSum(Above(count(Order_Number),1,RowNo())) / count(total Order_Number) <= 0.9, 'B', 'C'))

So in the most easy way this formula's above would be my dimensions and distinct SKU count would be my expressions, but unfortunately it doesn't work like that. Simply because the sortation rule is key in the result for this graph.

I hope you understand the questions and have a fitting solution.

Thanks

330 Views
Contributor

Thank you Henric hic!!

Very interesting post!

It worked perfect!

Now I need to calculate the ABC curve by the amount of products.

Example I would like:

Table:

A   10%   200

B    30%  1500

C    60%   8000

I want to calculate the amount of product (SKU code) for each curve, based on sales

count (produto) ...   how about it? how do the full expression?

330 Views
Partner

is there a way to have the pareto classes (A,B,C) as dimensions ?

so that user selects A and see the related products?

I tried to use this approach with the class function but it doesn't seem to work well

330 Views
Employee

No, it is (currently) not possible to use a pareto-based definition for the dimension. At least, I haven't found one...

You can however use a rank-based definition for the dimension. See Recipe for an ABC Analysis

HIC

330 Views
Partner

and can I put 0.8 instead of 0.5 for (Rank(Sum(Sales),1)-1) / Count(distinct total Product)

so that the result is the same as that of the first article?

330 Views
Employee

You can change it to whatever you want, but the ranking method will not return the same result as the pareto method.

Further, 0.5 in the ranking method (the better half of the customers) often corresponds to roughly 0.8 in pareto.

330 Views
Partner

and this is always over total number of products?

can it be by brands ?

330 Views
Employee

If you use the total qualifier, you can get it by brand or product group, e.g.

(Rank(total <ProductGroup> Sum(Sales),1)-1) / Count(distinct total total <ProductGroup> Product)

330 Views
Partner

I have a question outside this topic , may I ask you and you help me in return

330 Views
Employee

Post it as a separate question in the discussion forum and tag me. If I have time, I will respond.

/HIC

330 Views
Partner

Hello got a question on this

I select a brand and got the following for each product under the selected brand:

so the first four products represent the 80% of net sales for the selected brand?

or shall I compare it to total net sales?

330 Views
New Contributor

Hi Henric,

If I have this expression:

count({\$<FLAG_RUPTURA = {'RUPTURA'} >} FLAG_RUPTURA)/sum(TOTAL)

RangeSum(Above(Sum(Sales), 0, RowNo()))

How Can I use in the expression of accumulated Sales??

RangeSum(Above(Sum(count({\$<FLAG_RUPTURA = {'RUPTURA'} >} FLAG_RUPTURA)/sum(TOTAL)), 0, RowNo())) --> Like thtat doesnt work

330 Views
New Contributor II

Hello Sir,

thank you very much for this recipe. I am looking for this for a moment.

I am training on QlikView and i am beginner.

I cannot choose "sort by ewpression". Do you know why?

330 Views
New Contributor II

Hello Sir,

i followed chat you said but i do not know why Exculsive percentage were all nulls although i insered the same formula.

Thank you for help.

Best regards,

Alâa Eddine.

330 Views
Labels