Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I was wandering if someone could explain me the following question I have:
I ran into a few cases where using a set analysis expression in an AGGR function gives two different results when using the following:
1. function({$<set analysis expression>} AGGR (function({$<set analysis expression>} expression) field)
2. function( AGGR (function({$<set analysis expression>} expression) field)
As far as I know, the AGGR function forms an invisible table with one expression over a calculated dimension. The thing is that when I use the first script I mentioned above, compared to the second one, I get different results even though the set analysis is identical both in the outside part and in the inside. I expected to get the same results from the two expressions, but I was proved wrong. Can someone please help me with this issue?
Thanks,
Yaniv
Well, I can only repeat my very first statement (that's how I believe it's working):
"The set analysis expression in your inner aggregation function does not define the set of possible values for the AGGR dimension fields. If you need to modify this set (i.e. you want a different set than defined by the selections), you need to use a second set expression in the outer aggregation function."
If you think "the inside set analysis expression should of gotten over the Dim1 selection", I think you should open a ticket with QT. There is nothing I can do about this.
Regards,
Stefan
The set analysis expression in your inner aggregation function does not define the set of possible values for the AGGR dimension fields. If you need to modify this set (i.e. you want a different set than defined by the selections), you need to use a second set expression in the outer aggregation function.
Hi swuehl
So, if I understand correctly, you mean that the outer expression of set analysis is prioritized first and the inner one second? In other words, the inner one can only work on current selections and the outer one allows me to define a new set of possible values?
Advanced Aggregations
In this chapter, we will cover the following recipes:
ff
Using TOTAL to calculate the percentage of total and the percentage of subtotal
ff
Using AGGR to calculate the percentage of the maximum value
ff
Using AGGR to resolve a "Sum of Rows" issue
ff
Creating a dynamic AGGR expression for a Group dimension using Dollar Expansion
ff
Using RangeMax to return only positive numbers
ff
Creating a dynamic Part-to-Whole pie chart
ff
Creating a colored treemap using colormix
ff
Using RangeSum to calculate a rolling total in a multi-dimension table
ff
Showing only the top 3 values in a pivot table
ff
Creating a Statistical Control Chart using Standard Deviation
ff
Creating a Moving Range chart
ff
Creating a Control Chart using Moving Range
Introduction
Simple, straightforward aggregation functions are the staple of QlikView. Nine times out of ten, we tend to use something like Sum(), Count(), or Avg(). We might add in some Set Analysis, but the basic function is the same.
We usually call these "horizontal" functions, because they are calculated on the row, in the chart for the dimension on that row. So, we calculate the sum of sales for each country, or
we count the number of purchase orders for each year.
Advanced Aggregations
100
From time-to-time, we need to break out of the row and perform "vertical" calculations.
For these, we need to look at more advanced aggregation functions using TOTAL, AGGR,
or one of the range functions.
Using TOTAL to calculate the percentage of
total and subtotal
TOTAL is a useful qualifier that can be added to an aggregation function (such as Sum,
Count, and Avg) to tell QlikView to ignore the dimensions of a chart in the calculation.
By default, it ignores all the dimensions, but we can be more specific and tell it to ignore
some while respecting others.
In this recipe, we will calculate the percentage of total sales for each city, as well as the
percentage of sales for each city within its own country.
Getting ready
Load the following script:
LOAD * INLINE [
Country, City, Sales
USA, San Diego, 24567
USA, Dallas, 54962
USA, New York, 67013
USA, Boston, 45824
UK, London, 64002
UK, Birmingham, 44291
UK, Manchester, 40320
Germany, Berlin, 52912
Germany, Frankfurt, 61832
Germany, Munich, 35812
Japan, Tokyo, 42137
Japan, Yokohama, 55832
Japan, Osaka, 37643
];
Chapter 4
101
How to do it...
The following steps show how to use TOTAL to calculate the percentage of total and the
percentage of subtotal:
1. Create a new pivot table with Country and City as dimensions.
2. Add the following expressions:
Sales $ Sum(Sales)
Total Sales % Sum(Sales) / Sum(TOTAL Sales)
% Sales by
Country Sum(Sales) / Sum(TOTAL<Country> Sales)
3. On the Presentation tab, select the Country and City dimensions and turn on
Show Partial Sums.
4. On the Number tab, set the format for the first expression to Integer. Set the second
and third expression to Fixed to 2 decimals, and turn on Show in Percent (%).
5. Click on Finish to save the chart.
Advanced Aggregations
102
6. Right-click on the Country field in the pivot table, and select Expand all from the
menu to show all the cities:
7. Resize the chart to see all of the values.
How it works...
In the first expression, the TOTAL syntax instructs that part of the expression to ignore
the dimensions of the chart, and calculate the sum across the whole chart. The calculated
percentage is the sales over the overall total (627,147).
In the second expression, the syntax is TOTAL<Country>. This tells that part of the
expression to ignore all the dimensions; except Country. This gives us a subtotal by
each country, and the calculated percentage is the percentage sales for that city within
its own country.
There's more...
Within other charts, there is another way to achieve the overall total and that is to turn on
the Relative checkbox against the expression. This option is not available in pivot tables.
Chapter 4
103
Using AGGR to calculate the percentage of
the maximum value
AGGR is the function that we will use to perform vertical calculations.
In this case, we are going to work out which country has the highest value, and calculate the
percentage of other sales versus that highest value.
Getting ready
Load the following script:
LOAD * INLINE [
Country, City, Sales
USA, San Diego, 24567
USA, Dallas, 54962
USA, New York, 67013
USA, Boston, 45824
UK, London, 64002
UK, Birmingham, 44291
UK, Manchester, 40320
Germany, Berlin, 52912
Germany, Frankfurt, 61832
Germany, Munich, 35812
Japan, Tokyo, 42137
Advanced Aggregations
104
Japan, Yokohama, 55832
Japan, Osaka, 37643
];
How to do it...
Use the following steps to use AGGR to calculate the percentage of the maximum:
1. Create a new straight table with Country and City as dimensions.
2. Add the following expressions:
Sales $ Sum(Sales)
% of Max
Sum(Sales) /
Max(TOTAL Aggr(Sum(Sales), Country, City))
3. On the Expression tab, for the second expression, under Total Mode, select No Total.
4. On the Number tab, set the format for the first expression to Integer. Set the second
expression to Fixed to 2 decimals, and turn on Show in Percent (%).
5. Click on Finish to save the chart.
6. Double-click on the % of Max label to sort the values.
Chapter 4
105
How it works...
There are a couple of things going on here. First, the AGGR expression:
Aggr(Sum(Sales), Country, City)
AGGR basically creates a table chart in the memory. In this case, the expression is
Sum (Sales), and there are two dimensions—Country and City.
Once we have this in-memory chart, we can put one of our many aggregation functions (such
as Sum, Min, Max, and Stdev) around it to get a result. In this case, we use the Max function
to retrieve the maximum value in the range (67,013).
Since we are using the function in a chart that has dimensions, we also need to include the
TOTAL qualifier to tell the Max expression to ignore the dimensions. If we didn't, we would
just get 100 percent for every row!
There's more...
This is a relatively basic use of AGGR, but it actually doesn't get much more complicated!
When you add in the Set Analysis syntax into an expression, you can then finely control the
range of values that are returned.
Just remember that AGGR is an in-memory chart. Like all charts, it has an expression and
one or many dimensions. If you can build a straight table to get the result you are looking
for, then you can build an AGGR statement.
Advanced Aggregations
106
Any time that you find yourself writing an expression that has one aggregation function nested
inside another—like Max(Sum(Sales))—then you will need to use AGGR to have the function
calculate as expected—like Max(Aggr(Sum(Sales), Country)).
Using AGGR to resolve a "Sum of Rows"
issue
"Sum of Rows" issues are reasonably rare. Essentially, what it means is that the sum that
is displayed as the total on a table is not actually the sum of the rows displayed in the table
(usually discovered when exporting to Excel), or the total is not calculated at all. Often, it
can be the result of a complicated if statement in the expression, or from using values
from across different tables in the expression.
In this recipe, we will look at a very simple example and see how to resolve it using AGGR.
Getting ready
Load the following script:
// Load Sales data
Sales:
LOAD * INLINE [
Country, Month, Curr, ProdID, Price, Qty
USA, 2013-01-01, USD, 1, 1.99, 2
USA, 2013-02-01, USD, 2, 2.99, 3
USA, 2013-03-01, USD, 1, 1.98, 5
UK, 2013-01-01, GBP, 1, 1.59, 1
UK, 2013-02-01, GBP, 1, 1.58, 3
UK, 2013-03-01, GBP, 1, 1.58, 3
UK, 2013-03-01, GBP, 1, 1.58, 3
];
// Load Exchange Rates
LOAD * INLINE [
Curr, ExRate
USD, 1
GBP, 0.63
];
Chapter 4
107
How to do it...
These steps show how to resolve a "Sum of Rows" issue with AGGR:
1. Create a new straight table with Country as the dimension.
2. Add the following three expressions:
Volume Sum(Qty)
Sales $ Sum(Price*Qty)/ExRate
Avg. Price [Sales $] / Volume
3. Set the number format for the Volume expression to Integer. Set the format for the
other two expressions to Fixed to 2 Decimals.
4. Click on Finish to save the chart.
5. Note that the Sales $ total and Avg. Price values have not been calculated. Edit the
properties of the chart and change the Sales $ expression to the following:
Sum(AGGR(Sum(Price*Qty)/ExRate, Country))
6. Note that the totals are now calculating.
How it works...
Initially, the total for Sales $ does not work because the ExRate field will return more than
one value at the total level, which invalidates the whole expression.
Advanced Aggregations
108
When we wrap the expression in the AGGR statement, it makes no difference at all at the line
level—because the dimensions are being respected—but kicks in at the total level to give us
the correct sum.
There's more...
It is useful to note that you can fix the sum of rows issue in a straight table by simply changing
Total Mode of that expression to Sum of Rows instead of the default of Expression Total.
That method would not work though for the Avg. Price expression. Neither would it work in a
pivot table, which does not have the option. AGGR will work great in the pivot table.
See also
ff The Creating a dynamic AGGR expression for a Group dimension using Dollar
Expansion recipe in this chapter
Creating a dynamic AGGR expression for a
Group dimension using Dollar Expansion
AGGR is very useful for advanced aggregations. However, we need to be careful when using it
in charts that have a Group dimension; either cycle or drill.
In this recipe, we will see how to overcome this using a Dollar Expansion.
Getting ready
This example follows along from the previous section. Load the data and create the straight
table with the AGGR expression.
Chapter 4
109
How to do it...
Follow these steps to create a dynamic AGGR expression for a group dimension:
1. In Document Properties | Groups, create a group called Cycle1 with ProdID
and Month as the dimensions:
2. Edit the straight table and add Cycle1 as the second dimension.
3. Save the changes and look at the chart. Note that the expressions do not
calculate correctly for all the different cycle values.
4. Open the properties of the table and edit the Sales $ expression:
Sum(
AGGR(Sum(Price*Qty)/ExRate, Country,
[$(=GetCurrentField(Cycle1))])
)
Advanced Aggregations
110
5. Click on OK to save the chart:
6. Click on the cycle button and confirm that the values calculate correctly.
How it works...
This is a great example of using Dollar Expansion in an expression when dealing with a Group
dimension; or any other thing that might change.
When you click on the cycle button, the GetCurrentField function returns the name of
the field that is now in use and that changes the field that the calculation is aggregating over.
Note the use of square brackets around the Dollar Expansion; this is because any field in the
cycle could potentially have a space or other character in the fieldname that could break the
expression if the square brackets weren't there.
See also
ff The Using AGGR to resolve a "Sum of Rows" issue recipe in this chapter
Using RangeMax to return only positive
numbers
There are occasions where we need to only deal with positive values, such as when we have
a chart that will not calculate with negative numbers or when an expression makes no sense
having negative numbers.
In this case, we are going to display a block chart for sales, but deal with having negative values.
A block chart does not calculate with negative numbers.
Chapter 4
111
Getting ready
Load the following script:
LOAD * INLINE [
Country, Sales
USA, 120
UK, 100
Mexico, 76
Canada, 32
France, 10
Germany, -5
];
How to do it...
The following steps show you how to use RangeMax to return only positive numbers:
1. Create a new block chart. Set Country as the dimension.
2. Add the following expression:
Sum(Sales)
3. Click on Finish to save the chart:
Advanced Aggregations
112
4. Edit the properties of the chart and change the expression to the following:
RangeMax(Sum(Sales), 0)
5. Confirm that the chart now displays the data. There should be no block for Germany
as it was the negative value.
How it works...
The RangeMax function will return the highest value in the list of values. As long as
Sum(Sales) is positive, it will always be higher than 0. Once it is negative, 0 will be returned
by the function, and the negative value will not interfere with the calculation of the chart.
There's more...
Range functions are quite useful and work in both charts and scripts. They are essential when
using functions such as Above and Before, which return a range of values.
See also
ff The Using RangeSum to calculate a rolling total in a table recipe in this chapter
Chapter 4
113
Creating a dynamic Part-to-Whole pie chart
Many visualization experts will argue over the merits of the humble pie chart. Most,
however, will agree that the following example, with many segments, is probably the
least useful implementation.
Pie charts are most useful when doing a simple part-to-whole comparison. In this recipe,
we will create a dynamic part-to-whole bar chart that displays the sales value for the
currently selected set of countries versus the rest.
Getting ready
Load the following script:
LOAD * INLINE [
Country, Sales
USA, 53453
Mexico, 21317
Canada, 14545
UK, 12333
Advanced Aggregations
114
France, 21333
Germany, 23123
Japan, 44342
China, 34234
Australia, 12345
];
How to do it...
To create a dynamic part-to-whole pie chart, follow these steps:
1. Create a new pie chart.
2. On the Dimensions tab, use the Add Calculated Dimension button to add the
following dimension expression:
=if(
Aggr(Sum(1), Country)>0,
'Selected: $(=Concat(DISTINCT Country, ', '))',
'Others: $(=Concat({<Country=E({$})>} DISTINCT Country, ',
'))'
)
3. Label the dimension as Country.
Chapter 4
115
4. Add the following expression:
Sum({<Country=>} Sales)
5. Label the expression as Sales $.
6. Click on Next until you get to the Presentation tab. Click on the Settings button
for Legend:
Advanced Aggregations
116
7. Turn on the Wrap Text option and set Cell Height to 6. Click on OK. Finish the
chart wizard.
8. With the chart selected, hold down the Ctrl + Shift key. The different areas of
the chart will be highlighted. Use the mouse to narrow the legend area.
9. Add a Country listbox to the layout, to the right of the pie chart. Note what
happens as you make different selections.
How it works...
This is a great example of using an AGGR function in a calculated dimension to return just the
dimension values that are of interest. In this case:
Aggr(Sum(1), Country)
Will only evaluate for those countries that are currently selected because Sum(1) will be null
for the unselected countries. We check whether it is greater than 0 or not, and then use that
as the list of selected countries (achieved using Concat).
The tricky bit is getting back the list of unselected countries. We can do that by using a Set:
Concat({<Country=E({$})>} DISTINCT Country, ', ')
The E() set returns the list of excluded values.
By putting the Concat functions inside a dollar expansion, we cause them to be evaluated
separately to the chart, so they are calculated at the document level.
Chapter 4
117
There's more...
Other functions can be used quite effectively within an AGGR like this. For example:
=if(
Aggr(Rank(Sum(Sales)), Country)<=3,
Country,
'Others'
)
This will give us back the top three countries by rank of Sales. A fourth value called Others
will also appear in the dimensions.
See also
ff The Showing only the top three values in a pivot table recipe in this chapter
Creating a colored treemap using colormix
Treemaps were originally designed by Ben Shneiderman, a professor of Computer Science at
the University of Maryland. They are a very effective way of displaying hierarchical data. Within
QlikView, the basic implementation of the Treemap, using the size of the rectangles to encode
a value, can be effectively rendered by using the block chart.
If you want to encode a secondary measure, you can use a color function such as colormix1,
along with an advanced aggregation statement.
In this recipe, we are going to compare sales values using the size of the rectangles and
average order size using color.
Getting ready
Load the following script:
LOAD * INLINE [
Country, City, Sales, Orders
USA, San Diego, 24567, 546
USA, Dallas, 54962, 345
USA, New York, 67013, 678
USA, Boston, 45824, 365
UK, London, 64002, 743
UK, Birmingham, 44291, 572
Advanced Aggregations
118
UK, Manchester, 40320, 534
Germany, Berlin, 52912, 643
Germany, Frankfurt, 61832, 678
Germany, Munich, 35812, 325
Japan, Tokyo, 42137, 562
Japan, Yokohama, 55832, 753
Japan, Osaka, 37643, 418
];
How to do it...
These steps show how to create a colored Treemap with colormix:
1. Create a new block chart. Add Country, then City as dimensions.
2. Add the following expression:
Sum(Sales)
3. Click on Finish:
Chapter 4
119
4. The block chart will be created and will display the Sales $ values in the hierarchical
sized blocks. Edit the properties of the chart to set the color.
5. Click on + beside the Sales $ expression, and enter the following Background Color
expression in the Definition box:
ColorMix1(
(Sum(Sales)/Sum(Orders))/
Max(Total Aggr(Sum(Sales)/Sum(Orders), Country, City)),
White(),
Green()
)
6. Click on OK.
Advanced Aggregations
120
How it works...
The trick here lies in the AGGR expression:
(Sum(Sales)/Sum(Orders))/
Max(Total Aggr(Sum(Sales)/Sum(Orders), Country, City))
The Max AGGR part will return the highest value of the average sale for all of the cities.
By dividing this value into the average sale for the current city, we will get a value between
0 and 1, representing the rank of that city.
TOTAL in Max makes sure that the dimensions are ignored for that calculation. Otherwise the
calculation would only calculate for the current city and the value would be 1 for all cities:
In this case, Dallas has the highest average sales, so will be 1.000. San Diego has the lowest
average sales at 0.282.
When we plug those values into the ColorMix1 expression, it returns a color that is along a
scale from 0 to 1 of the start and end color.
The upshot is that Dallas has a dark green color while San Diego has a much lighter color;
closer to white.
Chapter 4
121
There's more...
This use of AGGR to calculate a value across dimensions like this is very typical. It is quite
a common use case.
Using RangeSum to calculate a rolling total
in a multidimension table
In a single dimensional chart, for example, bar chart, line chart, or straight table, the usual
way to create a rolling total is to use the accumulate option in the expression. This doesn't
work correctly in a multi-dimensional situation.
In this recipe, we will look at creating accumulations by using the RangeSum function.
Getting ready
Load the following script:
LOAD * INLINE [
Country, City, Sales, Orders
USA, San Diego, 24567, 546
USA, Dallas, 54962, 345
USA, New York, 67013, 678
USA, Boston, 45824, 365
UK, London, 64002, 743
UK, Birmingham, 44291, 572
UK, Manchester, 40320, 534
Germany, Berlin, 52912, 643
Germany, Frankfurt, 61832, 678
Germany, Munich, 35812, 325
Japan, Tokyo, 42137, 562
Japan, Yokohama, 55832, 753
Japan, Osaka, 37643, 418
];
How to do it...
Use these steps to calculate the rolling totals with RangeSum:
1. Create a new pivot table with Year and Country as dimensions.
Advanced Aggregations
122
2. Add the following three expressions:
Sales $ Sum(Sales)
Country Total RangeSum(Above(Sum(Sales), 0, RowNo()))
Overall Total RangeSum(Above(TOTAL Sum(Sales), 0,
RowNo(TOTAL)))
3. Click on Finish to save the chart.
4. Expand the Year dimension so that you can see all the values. Notice the rolling totals.
How it works...
The Above function allows us to specify which row "above" the current row to return and
the number of rows to return. By specifying 0 for the row to start on, we are telling above
to start at the current row. We then use one of the RowNo() functions to define how many
rows to return.
RowNo() by itself will return the number of the rows you are on for each of the second
dimension values within the first dimension (rows of Country within each Year). This
resets as the first dimension value changes.
RowNo(TOTAL) will ignore the dimensions and just give us the number of the row that
we are on in the whole chart.
The preceding statement will return a range of values. That is why we need to use the
RangeSum function to add up all of the values returned.
Chapter 4
123
There's more...
Similar to Above, there is a function called Below. There are also functions called Before
and After, which you can use in a pivot table, where one of the dimensions is pivoted across
the top of the chart to give us horizontal totals.
Showing only the top 3 values in a pivot
table
In version 11, QlikView introduced the feature Dimension Limits for most of the charts.
This feature allows you to limit the number of values displayed based on a certain criteria;
most usually, the value is in the top X (whatever X you want). It also allows you to include
an Others dimension value for everything else outside the top X.
This feature is not available for gauge charts (which don't usually have a dimension anyway)
and pivot tables.
In this recipe, we are going to implement the feature using AGGR in a pivot table.
Getting ready
Load the following script:
LOAD * INLINE [
Year, Country, Sales
2010, USA, 6013
2011, USA, 5295
2012, USA, 5551
2013, USA, 6932
2010, UK, 4512
2011, UK, 3976
2012, UK, 4691
2013, UK, 5276
2010, Japan, 2765
2011, Japan, 2567
2012, Japan, 3111
2013, Japan, 3234
2010, Germany, 4374
2011, Germany, 5673
2012, Germany, 4322
2013, Germany, 7654
Advanced Aggregations
124
2010, France, 4965
2011, France, 5097
2012, France, 5419
2013, France, 5732
2010, Australia, 3966
2011, Australia, 4087
2012, Australia, 2376
2013, Australia, 3784
];
How to do it...
These steps show you how to show only the top 3 values in a pivot table:
1. Create a new pivot table with Country and Year as dimensions.
2. Add the following expression for Sales $:
Sum(Sales)
3. Click on Finish to save the chart.
4. Expand all the Country values and pivot the Year dimension across the top:
Chapter 4
125
5. Edit the chart properties. On the Dimensions tab, select the Country dimension under
Used Dimensions, and click on the Edit button. Enter the following expression:
=If(Aggr(Rank(Sum(Sales)), Country)<=3, Country, 'Others')
6. Edit the dimension Label and set it to Country. Click on OK.
7. Note that there are now only four dimension values.
8. Edit the dimension property again and change the dimension expression
to the following:
=If(Aggr(Rank(Sum(Sales)), Country)<=3, Country, Null())
Advanced Aggregations
126
9. Turn on the option Suppress When Value is Null.
10. Note that only the top 3 are displayed.
How it works...
Using AGGR for a calculated dimension is quite a powerful use of QlikView. In this case,
we use AGGR to create an in-memory table chart of the rank of the sales for each country.
We then compare the result to see if it is <= 3. If it is, the if statement returns the country,
otherwise it returns something else—either Others or Null().
There's more...
It's not just using the values in an if statement that we can do; we can also use those
values as the dimension in the charts. Since they are linked to the original dimension in
AGGR, selecting one of these values will also select the underlying country or countries.
Creating a Statistical Control Chart using
Standard Deviation
Control charts were developed by a statistician named Walter Shewhart in the 1920's. He was
working for Bell Labs, who, at that time, were rolling out a telephony network across the US.
For this network, amplifiers and other such equipments needed to be buried underground, and
it was expensive to have to dig it up for repairs. They were worried about the variations in the
manufacturing process leading to increased cost in repairs.
Shewhart used control charts to show that variation was normal and reacting to small
variations by making changes to the manufacturing process was wrong. As long as the
variation was within control limits and not trending in any particular direction, there was
nothing to worry about.
A simple control chart uses the mean of the data (or a subset of that data) to draw a center
line around which the data varies. Control limits are set at two or three standard deviations
(on the basis that under a normal distribution, 95 Percent of the data should occur within
two standard deviations from the mean, and 99.7 percent of the data should occur within
three standard deviations).
Chapter 4
127
In this recipe, we will build a control chart by using the rainfall data from London, England,
to see if a recent perceived increase in rainfall was outside of the normal.
Getting ready
The rainfall data is available from the UK Met Office website. For this example, we are going to
look at the data from the Heathrow weather station. The URL is http://www.metoffice.
gov.uk/climate/uk/stationdata/heathrowdata.txt.
Let's create a new QlikView document with this data:
Load the following script:
LOAD @1:7 As Year,
@8:11 As Month,
@12:18 as [MaxTemp(C)],
@19:26 As [MinTemp(C)],
@27:34 as [AF Days],
@35:42 as [Rain(mm)],
@43:50 as [Sun Hours]
FROM
[http://www.metoffice.gov.uk/climate/uk/stationdata/heathrowdata.txt]
(fix, codepage is 1252, header is 7 lines)
Where @1:7 < 2013;
How to do it...
Follow these steps to create a control chart by using Stdev:
1. Create a new line chart with Year as the dimension.
2. Add the following expressions:
Rainfall(mm) Sum([Rain(mm)])
Mean Avg({<Year=>} TOTAL Aggr(Sum({<Year=>}
[Rain(mm)]), Year))
Lower Control
[Mean]
-(2*Stdev({<Year=>} TOTAL Aggr(Sum({<Year=>}
[Rain(mm)]), Year)))
Upper Control
[Mean]
+(2*Stdev({<Year=>} TOTAL Aggr(Sum({<Year=>}
[Rain(mm)]), Year)))
Advanced Aggregations
128
Note that I have used the label of Mean for the second expression, and that is used
as a reference in the third and fourth expressions.
3. Click on Finish to save the chart:
4. You should be able to see that 2012 rainfall, while being above average and above
2011 values, was within the control limits. There is no apparent trend of an increase
in rainfall in the region.
How it works...
There are a couple of things going on here that are important to note. I will focus on the AVG
function, as the use of the STDEV function is pretty much identical.
The AGGR functions use the TOTAL qualifier to ignore the dimensionality of the chart. This
is important, otherwise the average will be the same as the yearly rainfall value; it will be an
average of one value!
The second thing that is going on is that both the SUM function in the AGGR and AVG functions
have a Set syntax that excludes selections on the year. This is important because we want
AVG (and STDEV) to be calculated across all the data, and not to change if a user drills into
a particular date (for example, by selecting in the chart).
There's more...
Standard deviation is not the only way to build a control chart. Another method to use is
standard error.
Donald Wheeler, author of Understanding Variation: The Key to Managing Chaos, SPC Press,
Inc., suggests that there are problems with standard deviation or standard error because of
the assumptions about the homogeneity of the data. Instead, he prefers a method using a
moving average over an arbitrary period.
Chapter 4
129
See also
ff The Creating a Moving Range chart and Creating a control chart using
Moving Range recipes in this chapter
Creating a Moving Range chart
A Moving Range (mR) chart is a way of monitoring variation to try and work out if a system
is varying unexpectedly.
The range is simply calculated as the absolute difference between one data point and the
previous one. We then calculate an average—usually over a specific subset of the data—of
these values and apply a statistical constant (3.267 - D4 anti-biasing constant for a subgroup
size of n=2). The great thing about these constants is that you don't need to understand them
or how they are calculated; just that they exist.
In this recipe, we are going to use the same set of rainfall data as the previous recipe,
Creating a Statistical Control Chart using Standard Deviation, to see how the rainfall data
for Heathrow varies over time. We will use a 30-year period as our reference for what
"Average" means (the UK Met office uses 1961-1990).
Getting ready
Load the data from the UK Met Office website as outlined in the previous recipe, Creating a
statistical control chart using standard deviation.
How to do it...
To create a moving range chart, follow these steps:
1. Create a new line chart with Year as the dimension.
2. Add the following expressions:
Moving
Range
fabs(Above(Sum([Rain(mm)]))-Sum([Rain(mm)]))
Moving
Avg
Avg({<Year={">=1961<=1990"}>} total Aggr(fabs(Ab
ove(Sum({<Year={">=1961<=1990"}>} [Rain(mm)]))-
Sum({<Year={">=1961<=1990"}>} [Rain(mm)])), Year))
Upper
Control
[Moving Avg] * 3.267
Advanced Aggregations
130
Note that I have used the label of Moving Avg for the second expression and that is
used as a reference in the third expression.
3. Click on Finish to save the chart:
4. You should be able to see that, although the rainfall pattern varies, it generally varies
within normal parameters.
How it works...
Here, we are using the ABOVE function to establish the previous value for each data point.
We subtract the current data point from the previous one and use the FABS function to get
the absolute (positive) value.
The Moving Avg calculation takes almost the exact same function in an AGGR statement and
then obtains the average. There is one difference here, in that each aggregation function
contains a Set syntax to specifically select the 1961 to 1990 year range.
There's more...
A moving average chart like this is great for visualizing the variation and seeing if the data is
actually varying wildly. It provides the insight into the data that wouldn't normally be seen.
This is a great example of combining an advanced QlikView aggregation functions such as
AGGR, TOTAL, and ABOVE, as well as Set Analysis.
This type of mR chart is also frequently used with a control chart, where the control limits
are derived from the moving average rather than from standard deviation.
See also
ff The Creating a statistical control chart using standard deviation and Creating a
control chart using Moving Range recipes in this chapter
Chapter 4
131
Creating a control chart using Moving Range
Control charts are very quickly created by using a standard deviation function for control
limits. However, Donald Wheeler, author of Understanding Variation: The Key to Managing
Chaos, SPC Press, Inc., suggests that there are problems with standard deviation or standard
error because of assumptions about the homogeneity of the data. Instead, he prefers a
method using a moving average over an arbitrary period.
In this recipe, we are going to use the same set of rainfall data as the previous recipe,
Creating a statistical control chart using standard deviation, to see how the rainfall data
for Heathrow varies over time. Instead of using standard deviation of the data to derive the
control limits, we will use the moving average (mR) and a statistical constant (2.66 – this
value is obtained by dividing 3 by the sample size-specific d2 anti-biasing constant for a
subgroup size of n=2).
We will use a 30-year period as our reference for what "Average" means (the UK Met office
use 1961-1990).
Getting ready
Load the data from the UK Met Office website as outlined in the previous recipe, Creating a
statistical control chart using standard deviation.
How to do it...
These steps are used to create a control chart using moving range:
1. Create a new line chart with Year as the dimension.
2. Add the following expressions:
Rainfall(mm) Sum([Rain(mm)])
Avg Avg({<Year={">=1960<=1980"}>} total Aggr(Sum(
{<Year={">=1960<=1980"}>} [Rain(mm)]), Year))
Upper Natural
Process Limit
[Avg] + (Avg({<Year={">=1960<=1980"}>} total
Aggr(fabs(Above(Sum({<Year={">=1960<=1980"}>}
[Rain(mm)]))-Sum({<Year={">=1960<=1980"}>}
[Rain(mm)])), Year)) * 2.66)
Lower Natural
Process Limit
[Avg] - (Avg({<Year={">=1960<=1980"}>} total
Aggr(fabs(Above(Sum({<Year={">=1960<=1980"}>}
[Rain(mm)]))-Sum({<Year={">=1960<=1980"}>}
[Rain(mm)])), Year)) * 2.66)
Advanced Aggregations
132
Note that I have used the label of Avg for the second expression, and that is used
as a reference in the third and fourth expressions.
3. Click on Finish to save the chart:
4. You should be able to see that, although the rainfall pattern varies, it is well within
control limits.
How it works...
The average calculation here is a standard AGGR function, but we are using the Set syntax
to specify the range of data to use as being years 1961 to 1990.
The control limits calculations use an ABOVE and FABS function to calculate the absolute
difference between the values. AGGR allows us to calculate the average of this value and
we then multiply by our constant.
No, that's not what I meant.
Both set expressions are needed and there is not really a priorization, I think the only difference is the context they are applied to.
Hm, maybe have a look at attached sample, You'll see that you need to apply set expressions to both aggregation functions if you want to get the correct results.
Hi swuehl,
I looked into your example, and saw the different examples you have put there, but still I don't understand why this expression won't show up when Dim1 is selected and is not A
sum(aggr(sum({<Dim1 = {A}>}Expression3),Dim1))
since the inside set analysis expression should of "gotten over" the Dim1 selection.
Well, I can only repeat my very first statement (that's how I believe it's working):
"The set analysis expression in your inner aggregation function does not define the set of possible values for the AGGR dimension fields. If you need to modify this set (i.e. you want a different set than defined by the selections), you need to use a second set expression in the outer aggregation function."
If you think "the inside set analysis expression should of gotten over the Dim1 selection", I think you should open a ticket with QT. There is nothing I can do about this.
Regards,
Stefan
Hi Koti Reddy,
i have read your artical about Advanced Aggregations
if create one QVW about all options is better to understand