Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Using set analysis in an AGGR expression

Hello,

I was wandering if someone could explain me the following question I have:

I ran into a few cases where using a set analysis expression in an AGGR function gives two different results when using the following:

1. function({$<set analysis expression>} AGGR (function({$<set analysis expression>} expression) field)

2. function( AGGR (function({$<set analysis expression>} expression) field)

As far as I know, the AGGR function forms an invisible table with one expression over a calculated dimension. The thing is that when I use the first script I mentioned above, compared to the second one, I get different results even though the set analysis is identical both in the outside part and in the inside. I expected to get the same results from the two expressions, but I was proved wrong. Can someone please help me with this issue?

Thanks,

Yaniv

1 Solution

Accepted Solutions
swuehl
MVP
MVP

Well, I can only repeat my very first statement (that's how I believe it's working):

"The set analysis expression in your inner aggregation function does not define the set of possible values for the AGGR dimension fields. If you need to modify this set (i.e. you want a different set than defined by the selections), you need to use a second set expression in the outer aggregation function."


If you think "the inside set analysis expression should of gotten over the Dim1 selection", I think you should open a ticket with QT. There is nothing I can do about this.


Regards,

Stefan

View solution in original post

7 Replies
swuehl
MVP
MVP

The set analysis expression in your inner aggregation function does not define the set of possible values for the AGGR dimension fields. If you need to modify this set (i.e. you want a different set than defined by the selections), you need to use a second set expression in the outer aggregation function.

Not applicable
Author

Hi swuehl

So, if I understand correctly, you mean that the outer expression of set analysis is prioritized first and the inner one second? In other words, the inner one can only work on current selections and the outer one allows me to define a new set of possible values?

Not applicable
Author

Advanced Aggregations

In this chapter, we will cover the following recipes:

ff

Using TOTAL to calculate the percentage of total and the percentage of subtotal

ff

Using AGGR to calculate the percentage of the maximum value

ff

Using AGGR to resolve a "Sum of Rows" issue

ff

Creating a dynamic AGGR expression for a Group dimension using Dollar Expansion

ff

Using RangeMax to return only positive numbers

ff

Creating a dynamic Part-to-Whole pie chart

ff

Creating a colored treemap using colormix

ff

Using RangeSum to calculate a rolling total in a multi-dimension table

ff

Showing only the top 3 values in a pivot table

ff

Creating a Statistical Control Chart using Standard Deviation

ff

Creating a Moving Range chart

ff

Creating a Control Chart using Moving Range

Introduction

Simple, straightforward aggregation functions are the staple of QlikView. Nine times out of ten, we tend to use something like Sum(), Count(), or Avg(). We might add in some Set Analysis, but the basic function is the same.

We usually call these "horizontal" functions, because they are calculated on the row, in the chart for the dimension on that row. So, we calculate the sum of sales for each country, or

we count the number of purchase orders for each year.

Advanced Aggregations

100

From time-to-time, we need to break out of the row and perform "vertical" calculations.

For these, we need to look at more advanced aggregation functions using TOTAL, AGGR,

or one of the range functions.

Using TOTAL to calculate the percentage of

total and subtotal

TOTAL is a useful qualifier that can be added to an aggregation function (such as Sum,

Count, and Avg) to tell QlikView to ignore the dimensions of a chart in the calculation.

By default, it ignores all the dimensions, but we can be more specific and tell it to ignore

some while respecting others.

In this recipe, we will calculate the percentage of total sales for each city, as well as the

percentage of sales for each city within its own country.

Getting ready

Load the following script:

LOAD * INLINE [

Country, City, Sales

USA, San Diego, 24567

USA, Dallas, 54962

USA, New York, 67013

USA, Boston, 45824

UK, London, 64002

UK, Birmingham, 44291

UK, Manchester, 40320

Germany, Berlin, 52912

Germany, Frankfurt, 61832

Germany, Munich, 35812

Japan, Tokyo, 42137

Japan, Yokohama, 55832

Japan, Osaka, 37643

];

Chapter 4

101

How to do it...

The following steps show how to use TOTAL to calculate the percentage of total and the

percentage of subtotal:

1. Create a new pivot table with Country and City as dimensions.

2. Add the following expressions:

Sales $ Sum(Sales)

Total Sales % Sum(Sales) / Sum(TOTAL Sales)

% Sales by

Country Sum(Sales) / Sum(TOTAL<Country> Sales)

3. On the Presentation tab, select the Country and City dimensions and turn on

Show Partial Sums.

4. On the Number tab, set the format for the first expression to Integer. Set the second

and third expression to Fixed to 2 decimals, and turn on Show in Percent (%).

5. Click on Finish to save the chart.

Advanced Aggregations

102

6. Right-click on the Country field in the pivot table, and select Expand all from the

menu to show all the cities:

7. Resize the chart to see all of the values.

How it works...

In the first expression, the TOTAL syntax instructs that part of the expression to ignore

the dimensions of the chart, and calculate the sum across the whole chart. The calculated

percentage is the sales over the overall total (627,147).

In the second expression, the syntax is TOTAL<Country>. This tells that part of the

expression to ignore all the dimensions; except Country. This gives us a subtotal by

each country, and the calculated percentage is the percentage sales for that city within

its own country.

There's more...

Within other charts, there is another way to achieve the overall total and that is to turn on

the Relative checkbox against the expression. This option is not available in pivot tables.

Chapter 4

103

Using AGGR to calculate the percentage of

the maximum value

AGGR is the function that we will use to perform vertical calculations.

In this case, we are going to work out which country has the highest value, and calculate the

percentage of other sales versus that highest value.

Getting ready

Load the following script:

LOAD * INLINE [

Country, City, Sales

USA, San Diego, 24567

USA, Dallas, 54962

USA, New York, 67013

USA, Boston, 45824

UK, London, 64002

UK, Birmingham, 44291

UK, Manchester, 40320

Germany, Berlin, 52912

Germany, Frankfurt, 61832

Germany, Munich, 35812

Japan, Tokyo, 42137

Advanced Aggregations

104

Japan, Yokohama, 55832

Japan, Osaka, 37643

];

How to do it...

Use the following steps to use AGGR to calculate the percentage of the maximum:

1. Create a new straight table with Country and City as dimensions.

2. Add the following expressions:

Sales $ Sum(Sales)

% of Max

Sum(Sales) /

Max(TOTAL Aggr(Sum(Sales), Country, City))

3. On the Expression tab, for the second expression, under Total Mode, select No Total.

4. On the Number tab, set the format for the first expression to Integer. Set the second

expression to Fixed to 2 decimals, and turn on Show in Percent (%).

5. Click on Finish to save the chart.

6. Double-click on the % of Max label to sort the values.

Chapter 4

105

How it works...

There are a couple of things going on here. First, the AGGR expression:

Aggr(Sum(Sales), Country, City)

AGGR basically creates a table chart in the memory. In this case, the expression is

Sum (Sales), and there are two dimensions—Country and City.

Once we have this in-memory chart, we can put one of our many aggregation functions (such

as Sum, Min, Max, and Stdev) around it to get a result. In this case, we use the Max function

to retrieve the maximum value in the range (67,013).

Since we are using the function in a chart that has dimensions, we also need to include the

TOTAL qualifier to tell the Max expression to ignore the dimensions. If we didn't, we would

just get 100 percent for every row!

There's more...

This is a relatively basic use of AGGR, but it actually doesn't get much more complicated!

When you add in the Set Analysis syntax into an expression, you can then finely control the

range of values that are returned.

Just remember that AGGR is an in-memory chart. Like all charts, it has an expression and

one or many dimensions. If you can build a straight table to get the result you are looking

for, then you can build an AGGR statement.

Advanced Aggregations

106

Any time that you find yourself writing an expression that has one aggregation function nested

inside another—like Max(Sum(Sales))—then you will need to use AGGR to have the function

calculate as expected—like Max(Aggr(Sum(Sales), Country)).

Using AGGR to resolve a "Sum of Rows"

issue

"Sum of Rows" issues are reasonably rare. Essentially, what it means is that the sum that

is displayed as the total on a table is not actually the sum of the rows displayed in the table

(usually discovered when exporting to Excel), or the total is not calculated at all. Often, it

can be the result of a complicated if statement in the expression, or from using values

from across different tables in the expression.

In this recipe, we will look at a very simple example and see how to resolve it using AGGR.

Getting ready

Load the following script:

// Load Sales data

Sales:

LOAD * INLINE [

Country, Month, Curr, ProdID, Price, Qty

USA, 2013-01-01, USD, 1, 1.99, 2

USA, 2013-02-01, USD, 2, 2.99, 3

USA, 2013-03-01, USD, 1, 1.98, 5

UK, 2013-01-01, GBP, 1, 1.59, 1

UK, 2013-02-01, GBP, 1, 1.58, 3

UK, 2013-03-01, GBP, 1, 1.58, 3

UK, 2013-03-01, GBP, 1, 1.58, 3

];

// Load Exchange Rates

LOAD * INLINE [

Curr, ExRate

USD, 1

GBP, 0.63

];

Chapter 4

107

How to do it...

These steps show how to resolve a "Sum of Rows" issue with AGGR:

1. Create a new straight table with Country as the dimension.

2. Add the following three expressions:

Volume Sum(Qty)

Sales $ Sum(Price*Qty)/ExRate

Avg. Price [Sales $] / Volume

3. Set the number format for the Volume expression to Integer. Set the format for the

other two expressions to Fixed to 2 Decimals.

4. Click on Finish to save the chart.

5. Note that the Sales $ total and Avg. Price values have not been calculated. Edit the

properties of the chart and change the Sales $ expression to the following:

Sum(AGGR(Sum(Price*Qty)/ExRate, Country))

6. Note that the totals are now calculating.

How it works...

Initially, the total for Sales $ does not work because the ExRate field will return more than

one value at the total level, which invalidates the whole expression.

Advanced Aggregations

108

When we wrap the expression in the AGGR statement, it makes no difference at all at the line

level—because the dimensions are being respected—but kicks in at the total level to give us

the correct sum.

There's more...

It is useful to note that you can fix the sum of rows issue in a straight table by simply changing

Total Mode of that expression to Sum of Rows instead of the default of Expression Total.

That method would not work though for the Avg. Price expression. Neither would it work in a

pivot table, which does not have the option. AGGR will work great in the pivot table.

See also

ff The Creating a dynamic AGGR expression for a Group dimension using Dollar

Expansion recipe in this chapter

Creating a dynamic AGGR expression for a

Group dimension using Dollar Expansion

AGGR is very useful for advanced aggregations. However, we need to be careful when using it

in charts that have a Group dimension; either cycle or drill.

In this recipe, we will see how to overcome this using a Dollar Expansion.

Getting ready

This example follows along from the previous section. Load the data and create the straight

table with the AGGR expression.

Chapter 4

109

How to do it...

Follow these steps to create a dynamic AGGR expression for a group dimension:

1. In Document Properties | Groups, create a group called Cycle1 with ProdID

and Month as the dimensions:

2. Edit the straight table and add Cycle1 as the second dimension.

3. Save the changes and look at the chart. Note that the expressions do not

calculate correctly for all the different cycle values.

4. Open the properties of the table and edit the Sales $ expression:

Sum(

AGGR(Sum(Price*Qty)/ExRate, Country,

[$(=GetCurrentField(Cycle1))])

)

Advanced Aggregations

110

5. Click on OK to save the chart:

6. Click on the cycle button and confirm that the values calculate correctly.

How it works...

This is a great example of using Dollar Expansion in an expression when dealing with a Group

dimension; or any other thing that might change.

When you click on the cycle button, the GetCurrentField function returns the name of

the field that is now in use and that changes the field that the calculation is aggregating over.

Note the use of square brackets around the Dollar Expansion; this is because any field in the

cycle could potentially have a space or other character in the fieldname that could break the

expression if the square brackets weren't there.

See also

ff The Using AGGR to resolve a "Sum of Rows" issue recipe in this chapter

Using RangeMax to return only positive

numbers

There are occasions where we need to only deal with positive values, such as when we have

a chart that will not calculate with negative numbers or when an expression makes no sense

having negative numbers.

In this case, we are going to display a block chart for sales, but deal with having negative values.

A block chart does not calculate with negative numbers.

Chapter 4

111

Getting ready

Load the following script:

LOAD * INLINE [

Country, Sales

USA, 120

UK, 100

Mexico, 76

Canada, 32

France, 10

Germany, -5

];

How to do it...

The following steps show you how to use RangeMax to return only positive numbers:

1. Create a new block chart. Set Country as the dimension.

2. Add the following expression:

Sum(Sales)

3. Click on Finish to save the chart:

Advanced Aggregations

112

4. Edit the properties of the chart and change the expression to the following:

RangeMax(Sum(Sales), 0)

5. Confirm that the chart now displays the data. There should be no block for Germany

as it was the negative value.

How it works...

The RangeMax function will return the highest value in the list of values. As long as

Sum(Sales) is positive, it will always be higher than 0. Once it is negative, 0 will be returned

by the function, and the negative value will not interfere with the calculation of the chart.

There's more...

Range functions are quite useful and work in both charts and scripts. They are essential when

using functions such as Above and Before, which return a range of values.

See also

ff The Using RangeSum to calculate a rolling total in a table recipe in this chapter

Chapter 4

113

Creating a dynamic Part-to-Whole pie chart

Many visualization experts will argue over the merits of the humble pie chart. Most,

however, will agree that the following example, with many segments, is probably the

least useful implementation.

Pie charts are most useful when doing a simple part-to-whole comparison. In this recipe,

we will create a dynamic part-to-whole bar chart that displays the sales value for the

currently selected set of countries versus the rest.

Getting ready

Load the following script:

LOAD * INLINE [

Country, Sales

USA, 53453

Mexico, 21317

Canada, 14545

UK, 12333

Advanced Aggregations

114

France, 21333

Germany, 23123

Japan, 44342

China, 34234

Australia, 12345

];

How to do it...

To create a dynamic part-to-whole pie chart, follow these steps:

1. Create a new pie chart.

2. On the Dimensions tab, use the Add Calculated Dimension button to add the

following dimension expression:

=if(

Aggr(Sum(1), Country)>0,

'Selected: $(=Concat(DISTINCT Country, ', '))',

'Others: $(=Concat({<Country=E({$})>} DISTINCT Country, ',

'))'

)

3. Label the dimension as Country.

Chapter 4

115

4. Add the following expression:

Sum({<Country=>} Sales)

5. Label the expression as Sales $.

6. Click on Next until you get to the Presentation tab. Click on the Settings button

for Legend:

Advanced Aggregations

116

7. Turn on the Wrap Text option and set Cell Height to 6. Click on OK. Finish the

chart wizard.

8. With the chart selected, hold down the Ctrl + Shift key. The different areas of

the chart will be highlighted. Use the mouse to narrow the legend area.

9. Add a Country listbox to the layout, to the right of the pie chart. Note what

happens as you make different selections.

How it works...

This is a great example of using an AGGR function in a calculated dimension to return just the

dimension values that are of interest. In this case:

Aggr(Sum(1), Country)

Will only evaluate for those countries that are currently selected because Sum(1) will be null

for the unselected countries. We check whether it is greater than 0 or not, and then use that

as the list of selected countries (achieved using Concat).

The tricky bit is getting back the list of unselected countries. We can do that by using a Set:

Concat({<Country=E({$})>} DISTINCT Country, ', ')

The E() set returns the list of excluded values.

By putting the Concat functions inside a dollar expansion, we cause them to be evaluated

separately to the chart, so they are calculated at the document level.

Chapter 4

117

There's more...

Other functions can be used quite effectively within an AGGR like this. For example:

=if(

Aggr(Rank(Sum(Sales)), Country)<=3,

Country,

'Others'

)

This will give us back the top three countries by rank of Sales. A fourth value called Others

will also appear in the dimensions.

See also

ff The Showing only the top three values in a pivot table recipe in this chapter

Creating a colored treemap using colormix

Treemaps were originally designed by Ben Shneiderman, a professor of Computer Science at

the University of Maryland. They are a very effective way of displaying hierarchical data. Within

QlikView, the basic implementation of the Treemap, using the size of the rectangles to encode

a value, can be effectively rendered by using the block chart.

If you want to encode a secondary measure, you can use a color function such as colormix1,

along with an advanced aggregation statement.

In this recipe, we are going to compare sales values using the size of the rectangles and

average order size using color.

Getting ready

Load the following script:

LOAD * INLINE [

Country, City, Sales, Orders

USA, San Diego, 24567, 546

USA, Dallas, 54962, 345

USA, New York, 67013, 678

USA, Boston, 45824, 365

UK, London, 64002, 743

UK, Birmingham, 44291, 572

Advanced Aggregations

118

UK, Manchester, 40320, 534

Germany, Berlin, 52912, 643

Germany, Frankfurt, 61832, 678

Germany, Munich, 35812, 325

Japan, Tokyo, 42137, 562

Japan, Yokohama, 55832, 753

Japan, Osaka, 37643, 418

];

How to do it...

These steps show how to create a colored Treemap with colormix:

1. Create a new block chart. Add Country, then City as dimensions.

2. Add the following expression:

Sum(Sales)

3. Click on Finish:

Chapter 4

119

4. The block chart will be created and will display the Sales $ values in the hierarchical

sized blocks. Edit the properties of the chart to set the color.

5. Click on + beside the Sales $ expression, and enter the following Background Color

expression in the Definition box:

ColorMix1(

(Sum(Sales)/Sum(Orders))/

Max(Total Aggr(Sum(Sales)/Sum(Orders), Country, City)),

White(),

Green()

)

6. Click on OK.

Advanced Aggregations

120

How it works...

The trick here lies in the AGGR expression:

(Sum(Sales)/Sum(Orders))/

Max(Total Aggr(Sum(Sales)/Sum(Orders), Country, City))

The Max AGGR part will return the highest value of the average sale for all of the cities.

By dividing this value into the average sale for the current city, we will get a value between

0 and 1, representing the rank of that city.

TOTAL in Max makes sure that the dimensions are ignored for that calculation. Otherwise the

calculation would only calculate for the current city and the value would be 1 for all cities:

In this case, Dallas has the highest average sales, so will be 1.000. San Diego has the lowest

average sales at 0.282.

When we plug those values into the ColorMix1 expression, it returns a color that is along a

scale from 0 to 1 of the start and end color.

The upshot is that Dallas has a dark green color while San Diego has a much lighter color;

closer to white.

Chapter 4

121

There's more...

This use of AGGR to calculate a value across dimensions like this is very typical. It is quite

a common use case.

Using RangeSum to calculate a rolling total

in a multidimension table

In a single dimensional chart, for example, bar chart, line chart, or straight table, the usual

way to create a rolling total is to use the accumulate option in the expression. This doesn't

work correctly in a multi-dimensional situation.

In this recipe, we will look at creating accumulations by using the RangeSum function.

Getting ready

Load the following script:

LOAD * INLINE [

Country, City, Sales, Orders

USA, San Diego, 24567, 546

USA, Dallas, 54962, 345

USA, New York, 67013, 678

USA, Boston, 45824, 365

UK, London, 64002, 743

UK, Birmingham, 44291, 572

UK, Manchester, 40320, 534

Germany, Berlin, 52912, 643

Germany, Frankfurt, 61832, 678

Germany, Munich, 35812, 325

Japan, Tokyo, 42137, 562

Japan, Yokohama, 55832, 753

Japan, Osaka, 37643, 418

];

How to do it...

Use these steps to calculate the rolling totals with RangeSum:

1. Create a new pivot table with Year and Country as dimensions.

Advanced Aggregations

122

2. Add the following three expressions:

Sales $ Sum(Sales)

Country Total RangeSum(Above(Sum(Sales), 0, RowNo()))

Overall Total RangeSum(Above(TOTAL Sum(Sales), 0,

RowNo(TOTAL)))

3. Click on Finish to save the chart.

4. Expand the Year dimension so that you can see all the values. Notice the rolling totals.

How it works...

The Above function allows us to specify which row "above" the current row to return and

the number of rows to return. By specifying 0 for the row to start on, we are telling above

to start at the current row. We then use one of the RowNo() functions to define how many

rows to return.

RowNo() by itself will return the number of the rows you are on for each of the second

dimension values within the first dimension (rows of Country within each Year). This

resets as the first dimension value changes.

RowNo(TOTAL) will ignore the dimensions and just give us the number of the row that

we are on in the whole chart.

The preceding statement will return a range of values. That is why we need to use the

RangeSum function to add up all of the values returned.

Chapter 4

123

There's more...

Similar to Above, there is a function called Below. There are also functions called Before

and After, which you can use in a pivot table, where one of the dimensions is pivoted across

the top of the chart to give us horizontal totals.

Showing only the top 3 values in a pivot

table

In version 11, QlikView introduced the feature Dimension Limits for most of the charts.

This feature allows you to limit the number of values displayed based on a certain criteria;

most usually, the value is in the top X (whatever X you want). It also allows you to include

an Others dimension value for everything else outside the top X.

This feature is not available for gauge charts (which don't usually have a dimension anyway)

and pivot tables.

In this recipe, we are going to implement the feature using AGGR in a pivot table.

Getting ready

Load the following script:

LOAD * INLINE [

Year, Country, Sales

2010, USA, 6013

2011, USA, 5295

2012, USA, 5551

2013, USA, 6932

2010, UK, 4512

2011, UK, 3976

2012, UK, 4691

2013, UK, 5276

2010, Japan, 2765

2011, Japan, 2567

2012, Japan, 3111

2013, Japan, 3234

2010, Germany, 4374

2011, Germany, 5673

2012, Germany, 4322

2013, Germany, 7654

Advanced Aggregations

124

2010, France, 4965

2011, France, 5097

2012, France, 5419

2013, France, 5732

2010, Australia, 3966

2011, Australia, 4087

2012, Australia, 2376

2013, Australia, 3784

];

How to do it...

These steps show you how to show only the top 3 values in a pivot table:

1. Create a new pivot table with Country and Year as dimensions.

2. Add the following expression for Sales $:

Sum(Sales)

3. Click on Finish to save the chart.

4. Expand all the Country values and pivot the Year dimension across the top:

Chapter 4

125

5. Edit the chart properties. On the Dimensions tab, select the Country dimension under

Used Dimensions, and click on the Edit button. Enter the following expression:

=If(Aggr(Rank(Sum(Sales)), Country)<=3, Country, 'Others')

6. Edit the dimension Label and set it to Country. Click on OK.

7. Note that there are now only four dimension values.

8. Edit the dimension property again and change the dimension expression

to the following:

=If(Aggr(Rank(Sum(Sales)), Country)<=3, Country, Null())

Advanced Aggregations

126

9. Turn on the option Suppress When Value is Null.

10. Note that only the top 3 are displayed.

How it works...

Using AGGR for a calculated dimension is quite a powerful use of QlikView. In this case,

we use AGGR to create an in-memory table chart of the rank of the sales for each country.

We then compare the result to see if it is <= 3. If it is, the if statement returns the country,

otherwise it returns something else—either Others or Null().

There's more...

It's not just using the values in an if statement that we can do; we can also use those

values as the dimension in the charts. Since they are linked to the original dimension in

AGGR, selecting one of these values will also select the underlying country or countries.

Creating a Statistical Control Chart using

Standard Deviation

Control charts were developed by a statistician named Walter Shewhart in the 1920's. He was

working for Bell Labs, who, at that time, were rolling out a telephony network across the US.

For this network, amplifiers and other such equipments needed to be buried underground, and

it was expensive to have to dig it up for repairs. They were worried about the variations in the

manufacturing process leading to increased cost in repairs.

Shewhart used control charts to show that variation was normal and reacting to small

variations by making changes to the manufacturing process was wrong. As long as the

variation was within control limits and not trending in any particular direction, there was

nothing to worry about.

A simple control chart uses the mean of the data (or a subset of that data) to draw a center

line around which the data varies. Control limits are set at two or three standard deviations

(on the basis that under a normal distribution, 95 Percent of the data should occur within

two standard deviations from the mean, and 99.7 percent of the data should occur within

three standard deviations).

Chapter 4

127

In this recipe, we will build a control chart by using the rainfall data from London, England,

to see if a recent perceived increase in rainfall was outside of the normal.

Getting ready

The rainfall data is available from the UK Met Office website. For this example, we are going to

look at the data from the Heathrow weather station. The URL is http://www.metoffice.

gov.uk/climate/uk/stationdata/heathrowdata.txt.

Let's create a new QlikView document with this data:

Load the following script:

LOAD @1:7 As Year,

@8:11 As Month,

@12:18 as [MaxTemp(C)],

@19:26 As [MinTemp(C)],

@27:34 as [AF Days],

@35:42 as [Rain(mm)],

@43:50 as [Sun Hours]

FROM

[http://www.metoffice.gov.uk/climate/uk/stationdata/heathrowdata.txt]

(fix, codepage is 1252, header is 7 lines)

Where @1:7 < 2013;

How to do it...

Follow these steps to create a control chart by using Stdev:

1. Create a new line chart with Year as the dimension.

2. Add the following expressions:

Rainfall(mm) Sum([Rain(mm)])

Mean Avg({<Year=>} TOTAL Aggr(Sum({<Year=>}

[Rain(mm)]), Year))

Lower Control

[Mean]

-(2*Stdev({<Year=>} TOTAL Aggr(Sum({<Year=>}

[Rain(mm)]), Year)))

Upper Control

[Mean]

+(2*Stdev({<Year=>} TOTAL Aggr(Sum({<Year=>}

[Rain(mm)]), Year)))

Advanced Aggregations

128

Note that I have used the label of Mean for the second expression, and that is used

as a reference in the third and fourth expressions.

3. Click on Finish to save the chart:

4. You should be able to see that 2012 rainfall, while being above average and above

2011 values, was within the control limits. There is no apparent trend of an increase

in rainfall in the region.

How it works...

There are a couple of things going on here that are important to note. I will focus on the AVG

function, as the use of the STDEV function is pretty much identical.

The AGGR functions use the TOTAL qualifier to ignore the dimensionality of the chart. This

is important, otherwise the average will be the same as the yearly rainfall value; it will be an

average of one value!

The second thing that is going on is that both the SUM function in the AGGR and AVG functions

have a Set syntax that excludes selections on the year. This is important because we want

AVG (and STDEV) to be calculated across all the data, and not to change if a user drills into

a particular date (for example, by selecting in the chart).

There's more...

Standard deviation is not the only way to build a control chart. Another method to use is

standard error.

Donald Wheeler, author of Understanding Variation: The Key to Managing Chaos, SPC Press,

Inc., suggests that there are problems with standard deviation or standard error because of

the assumptions about the homogeneity of the data. Instead, he prefers a method using a

moving average over an arbitrary period.

Chapter 4

129

See also

ff The Creating a Moving Range chart and Creating a control chart using

Moving Range recipes in this chapter

Creating a Moving Range chart

A Moving Range (mR) chart is a way of monitoring variation to try and work out if a system

is varying unexpectedly.

The range is simply calculated as the absolute difference between one data point and the

previous one. We then calculate an average—usually over a specific subset of the data—of

these values and apply a statistical constant (3.267 - D4 anti-biasing constant for a subgroup

size of n=2). The great thing about these constants is that you don't need to understand them

or how they are calculated; just that they exist.

In this recipe, we are going to use the same set of rainfall data as the previous recipe,

Creating a Statistical Control Chart using Standard Deviation, to see how the rainfall data

for Heathrow varies over time. We will use a 30-year period as our reference for what

"Average" means (the UK Met office uses 1961-1990).

Getting ready

Load the data from the UK Met Office website as outlined in the previous recipe, Creating a

statistical control chart using standard deviation.

How to do it...

To create a moving range chart, follow these steps:

1. Create a new line chart with Year as the dimension.

2. Add the following expressions:

Moving

Range

fabs(Above(Sum([Rain(mm)]))-Sum([Rain(mm)]))

Moving

Avg

Avg({<Year={">=1961<=1990"}>} total Aggr(fabs(Ab

ove(Sum({<Year={">=1961<=1990"}>} [Rain(mm)]))-

Sum({<Year={">=1961<=1990"}>} [Rain(mm)])), Year))

Upper

Control

[Moving Avg] * 3.267

Advanced Aggregations

130

Note that I have used the label of Moving Avg for the second expression and that is

used as a reference in the third expression.

3. Click on Finish to save the chart:

4. You should be able to see that, although the rainfall pattern varies, it generally varies

within normal parameters.

How it works...

Here, we are using the ABOVE function to establish the previous value for each data point.

We subtract the current data point from the previous one and use the FABS function to get

the absolute (positive) value.

The Moving Avg calculation takes almost the exact same function in an AGGR statement and

then obtains the average. There is one difference here, in that each aggregation function

contains a Set syntax to specifically select the 1961 to 1990 year range.

There's more...

A moving average chart like this is great for visualizing the variation and seeing if the data is

actually varying wildly. It provides the insight into the data that wouldn't normally be seen.

This is a great example of combining an advanced QlikView aggregation functions such as

AGGR, TOTAL, and ABOVE, as well as Set Analysis.

This type of mR chart is also frequently used with a control chart, where the control limits

are derived from the moving average rather than from standard deviation.

See also

ff The Creating a statistical control chart using standard deviation and Creating a

control chart using Moving Range recipes in this chapter

Chapter 4

131

Creating a control chart using Moving Range

Control charts are very quickly created by using a standard deviation function for control

limits. However, Donald Wheeler, author of Understanding Variation: The Key to Managing

Chaos, SPC Press, Inc., suggests that there are problems with standard deviation or standard

error because of assumptions about the homogeneity of the data. Instead, he prefers a

method using a moving average over an arbitrary period.

In this recipe, we are going to use the same set of rainfall data as the previous recipe,

Creating a statistical control chart using standard deviation, to see how the rainfall data

for Heathrow varies over time. Instead of using standard deviation of the data to derive the

control limits, we will use the moving average (mR) and a statistical constant (2.66 – this

value is obtained by dividing 3 by the sample size-specific d2 anti-biasing constant for a

subgroup size of n=2).

We will use a 30-year period as our reference for what "Average" means (the UK Met office

use 1961-1990).

Getting ready

Load the data from the UK Met Office website as outlined in the previous recipe, Creating a

statistical control chart using standard deviation.

How to do it...

These steps are used to create a control chart using moving range:

1. Create a new line chart with Year as the dimension.

2. Add the following expressions:

Rainfall(mm) Sum([Rain(mm)])

Avg Avg({<Year={">=1960<=1980"}>} total Aggr(Sum(

{<Year={">=1960<=1980"}>} [Rain(mm)]), Year))

Upper Natural

Process Limit

[Avg] + (Avg({<Year={">=1960<=1980"}>} total

Aggr(fabs(Above(Sum({<Year={">=1960<=1980"}>}

[Rain(mm)]))-Sum({<Year={">=1960<=1980"}>}

[Rain(mm)])), Year)) * 2.66)

Lower Natural

Process Limit

[Avg] - (Avg({<Year={">=1960<=1980"}>} total

Aggr(fabs(Above(Sum({<Year={">=1960<=1980"}>}

[Rain(mm)]))-Sum({<Year={">=1960<=1980"}>}

[Rain(mm)])), Year)) * 2.66)

Advanced Aggregations

132

Note that I have used the label of Avg for the second expression, and that is used

as a reference in the third and fourth expressions.

3. Click on Finish to save the chart:

4. You should be able to see that, although the rainfall pattern varies, it is well within

control limits.

How it works...

The average calculation here is a standard AGGR function, but we are using the Set syntax

to specify the range of data to use as being years 1961 to 1990.

The control limits calculations use an ABOVE and FABS function to calculate the absolute

difference between the values. AGGR allows us to calculate the average of this value and

we then multiply by our constant.

swuehl
MVP
MVP

No, that's not what I meant.

Both set expressions are needed and there is not really a priorization, I think the only difference is the context they are applied to.

Hm, maybe have a look at attached sample, You'll see that you need to apply set expressions to both aggregation functions if you want to get the correct results.

Not applicable
Author

Hi swuehl,

I looked into your example, and saw the different examples you have put there, but still I don't understand why this expression won't show up when Dim1 is selected and is not A

sum(aggr(sum({<Dim1 = {A}>}Expression3),Dim1))

since the inside set analysis expression should of "gotten over" the Dim1 selection.

swuehl
MVP
MVP

Well, I can only repeat my very first statement (that's how I believe it's working):

"The set analysis expression in your inner aggregation function does not define the set of possible values for the AGGR dimension fields. If you need to modify this set (i.e. you want a different set than defined by the selections), you need to use a second set expression in the outer aggregation function."


If you think "the inside set analysis expression should of gotten over the Dim1 selection", I think you should open a ticket with QT. There is nothing I can do about this.


Regards,

Stefan

sakamsureshbabu
Creator
Creator

Hi Koti Reddy,

i have read your artical about Advanced Aggregations

if create one QVW about all options is better to understand