Qlik Community

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Employee
Employee

A Primer on Set Analysis

Set analysis is one of the more complex things you can use in QlikView or Qlik Sense. Its syntax is often perceived as complicated and there are some misunderstandings around it. So here is my short explanation.

Set analysis is a way to define an aggregation scope different from current selection. Think of it as a way to define a conditional aggregation. The condition, however, is in itself like a selection that is evaluated before the cube (the chart) is expanded. Hence, it is not possible to have a condition that is evaluated row by row.

 

The set analysis is used inside an aggregation function, e.g. in an expression like

 

          Sum(Sales)

 

The first step is to add the markers for the set analysis – the curly brackets:

 

          Sum( {…} Sales)

 

These define the record set over which the aggregation should be made. Inside, you can use different identifiers and operators, e.g. ‘$’ for records implied by current selection, ‘1’ for all records, ‘1-$’ for all excluded records, etc.

 

A set of records that you can define by a simple selection is called natural set. Not all record sets are natural; for instance, {1-$} cannot always be defined through a selection.

 

The next step is often to add a set modifier, which is defined by angle brackets. The set modifier adds or changes a selection. It can be used on any natural set and consists of a list of fields, where each field can have a new selection:

 

          Sum( {$<…>} Sales)

 

The next step is to define the element set for a field; the set of field values that defines the selection. The element set could be a field reference or a set function, P() or E(). It is more common, though, that it is an explicit list of field values or a search, and then you need the curly brackets to define the element set:

 

          Sum( {$<Date={…}>} Sales)   or   Sum( {$<Date=P(…)>} Sales)

 

A search can be defined through double quotes. This way, field values that match the search string will be selected:

 

          Sum( {$<Date={"…"}>} Sales)

 

Do not use single quotes to initiate a search here! Single quotes should denote literals, i.e. explicit field values. (Yes, today single quotes work as a search, but this is a bug that one day will be fixed…)

 

Often a numeric search is made, defined by a leading relational operator. Then, field values will be selected based on a comparison:

 

          Sum( {$<Date={"<=…"}>} Sales)

 

To make it worse, the value to which the field values are compared is often a calculated one. And in order to get a calculated value into the expression, a dollar expansion is needed.:

 

          Sum( {$<Date={"<=$(…)"}>} Sales)

 

Inside the dollar expansion, you need an expression that starts with an equals sign and contains an aggregation function, e.g.:

 

          Sum( {$<Date={"<=$(=Max(Date))"}>} Sales)

 

This aggregation function is evaluated globally, before the cube is expanded.

 

As you can see, there are many levels of a set expression, and many pairs of brackets and delimiters that need to match. When you write set expressions, you should always write both brackets directly, and then continue with the expression between them. This way you will avoid simple syntax errors.

 

Good luck with your set analysis expressions!

 

HIC

 

Further reading related to this topic:

Why is it called Set Analysis?

Dates in Set Analysis

Excluding values in Set Analysis

Introduction to Set Analysis (video) - Part 1

34 Comments
terezagr
Contributor III

Hi Henric,

thank you for another brilliant blog post!

I have a question regarding to: "Do not use single quotes here! Single quotes should denote literals, i.e. explicit field values. (Yes, today single quotes work as a search, but this is a bug that one day will be fixed…)"

I am just going through Advanced Topics in Design and Development course book and I found the following there:

String/Numeric values

To select string or numeric values, place the value inside {} and single quotes (double quotes also work, but using them is not best practice).

=Sum({$<Year={'2011'}>}LineSalesAmount)

=Sum({$<Country={'USA'}>}LineSalesAmount)

same is then shown for Conditional values

=Sum({$<OrderDate={'>01/01/2011'}>}LinesSalesAmount)

So where is the true? Should I trust double quotes or single quotes? Can you please clarify?

Thank you for your help!

T.

830 Views
Employee
Employee

In all other places in QlikView/Qlik Sense, single quotes mean literals and double quotes mean something else. The intention for Set Analysis was when it was designed, to follow the same logic. However, the implementation today is that single or double quotes doesn't matter in Set Analysis. But there is a consensus in the Qlik development organization that this is a bug that should be corrected.

For the above examples, this means that

     =Sum({$<Year={'2011'}>} LineSalesAmount)

     =Sum({$<Country={'USA'}>} LineSalesAmount)

are both correct. Both 2011 and USA are literals, so single quotes are correct, if you want a case sensitive comparison.

The third example is correct if you compare how QlikView de facto works, but incorrect given the design intention. Our education department has (correctly) described how it works, but not how it should work.

I'll talk to our education department and make sure the training material is changed.

HIC

830 Views
terezagr
Contributor III

Thank you Henric! It's all clear now.

0 Likes
830 Views
robert99
Valued Contributor II
0 Likes
830 Views
ebrambilla
New Contributor

Great! Clear explanation! Thanks.

0 Likes
830 Views
Not applicable

DR. Henric Cronström,

A million Thanks for your wonderful insights. This is a very important and complicated concepts and through examples you have made it clear. I highly appreciated it.

I hope to stay in touch for Data Modeling, Incremental load among other challenging questions.

0 Likes
830 Views
mrooney
New Contributor III

Hi Henric,

Great post indeed!

I have just a little doubt about how Qlik constructs the global set of data before raise the cube.

Let’s say for instance, that we have a straight table with many expressions. Each one of them have their own set analysis which may be different one from another in some way.

So, When you say: “This aggregation function is evaluated globally, before the cube is expanded”. Do you mean that Qlikview makes the union of all differents sets obtained from each set analysis definition (inside every single aggregation expression from the chart)  in order to obtain a global set of data before expand de cube?

Thank you for your help.

MR

0 Likes
830 Views
sibideepak
Contributor II

Clear !!

0 Likes
830 Views
Employee
Employee

mrooney‌:

Yes, sort of... In a way, the calculations need to be made on the union of the sets. However, it is slightly more intricate, and not so simple...

If you have two calculations: Sum({BM1} Sales) and Count({BM2} Products), then you have two aggregation functions that operate on two different tables: "Sales" is typically in the fact table, and "Products" is in a dimension table. This means that the two aggregations are independent and not affected by the set in the other aggregation. For example, the count of products does not depend on BM1. Hence, two separate aggregations.

But if both aggregations are calculated from the same table, then it is optimized so that only one pass is made over this table, and both sets are taken into account.

HIC

830 Views
rajkumarb
Contributor II

Very Helpful, Thank you

0 Likes
830 Views
kkkumar82
Valued Contributor III

Hi HIC,

Can we use ? in set analysis, if so, what is the use , can you explain.

thanks

KK

0 Likes
830 Views
Employee
Employee

You can use a question mark as a single-letter wildcard if you make a set analysis with a text search. For example, the following expression will sum the sales for products that have 'x' as the second character in the product name:

Sum( {$<Product={"?x*"}>} Sales)

HIC

830 Views
kkkumar82
Valued Contributor III

Thanks

for reply.

KK

0 Likes
830 Views
anderseriksson
Contributor III

Hi Hendrik, you obviusly have not grasped the search functionality in set analysis.

Your expression with the year range:

SUM({<YEAR=>}If(YEAR>=$(vStartYear) AND YEAR<=$(vEndYear),SALES))

would be better like this using set analysis finding all years in your intervall:

SUM({<YEAR={">=$(vStartYear)<=$(vEndYear)"}>} SALES)

Using If in your expressions is one of the top reasons an expression is heavy to calculate and takes much CPU.

When the If statement does not make use of the dimension values you are better of using set analysis.

If you do have a condition dependent on the dimension values unfortunately you can't use set analysis.

From that combining the two is easy:

SUM({<YEAR={">=$(vStartYear)<=$(vEndYear)"}, PRODUCT={"Itemfield_inside_PRODUCT"}>} SALES)


Note that you should use " instead of ' as Henric has pointed out numerous times.


Personally I would not use variables for the years.

If you have a list box for the years why not simply let the users select the range of years they want to view?

Triggers are best avoided as they inflict extra calculations each time fired.

0 Likes
830 Views
Not applicable

Hi Henric,

As I understand the set analysis is always about filtering a field by a value, either is it a static value or a parameter, But is it possible to filter a field by another field in set analysis?

For example I have a table as below:

Flag1   Flag2  Day

1          0         1/6/1999

0          0          2/6/2012

0          1          2/6/2006

I need to do set analysis to filter this table on "Flag1 <> Flag2", is there an easy way to do it?

BR, P Q

0 Likes
830 Views
Not applicable

Thank you, very helpful.

0 Likes
830 Views
Employee
Employee

No, that would be a row-based evaluation, and that cannot be done. You could, however, define a new field "Equality" in the script, e.g.

     Load Flag1, Flag2, Day, If(Flag1=Flag2,1,0) as Equality From ...

and then use this flag in Set Analysis.

HIC

830 Views
Not applicable

Thanks a lot Henric!

0 Likes
830 Views
MindaugasBacius
Valued Contributor II

Comprehensive blog post for new and experience users of Qlikview.

Good job, Henric!

0 Likes
830 Views
Not applicable

Great article, it clear out many of my current doubts!

0 Likes
830 Views
edwinwitvoet
New Contributor III

Thanks for the article indeed, very helpful.

How would you suggest to deal with excluding values - selectable by a user through a selection box ?

For example, assume the following tables :

  • Sales (sales_date, product_id, customer_id, total_price, total_items)
  • Customer (customer_id, customer_name, customer_address, country, city)

How do I find

  • sum of total_items
  • by customers who live in The Netherlands
  • who do not live in a specific city, where the cities to exclude is based on a user selection

So, I am looking for a generic way how to deal with excluding values, selected through the dashboard by the user. Appreciate your feedback!

0 Likes
830 Views
kkkumar82
Valued Contributor III

Hi Edwin,

Sum({<country = {'Netherlands'}>Sales}   - This is for the sales for the people who live in Netherlands

Sum({<city -={'$(=getfieldselections(city))'} Sales) - This is for the sales excluding a selected city.

Hope it helps

Kiran Kumar

830 Views
Employee
Employee
830 Views
edwinwitvoet
New Contributor III

Thanks a lot Kiran. I had to refine it a bit in order to be able work with multiple values as selected cities. For that I used a variable for the GetFieldSelections results as described here.

GetFieldSelections with set analysis

0 Likes
830 Views
okolyug16
Contributor II

Great article! It got me answers to a number of my current issues.

Thanks

0 Likes
830 Views
adamgaox
Contributor

Hi Henric,

I'm interested in your tips on Sum( {$<Date={"<=$(=Max(Date))"}>} Sales)

you mentioned there the function in the element area is evaluated globally.

Is there any way to make the function work under the dimension in straight tables?

Thanks.

Actually I asked this for the latest discussion I start in the blog

https://community.qlik.com/message/1130102#1130102

Thanks in advance.

Best Regards.

0 Likes
830 Views
Employee
Employee

I assume you mean that the function should be evaluated once per dimensional value. If so, the answer is "No". Set analysis is like a selection that is made before the chart is calculated. So it is not possible to write a set analysis expression that is evaluated to different values within the same chart.

HIC

830 Views
Not applicable

Thanks for this wonderful explanation Henric! I was just performing was fact table concatenation and was looking for an explanation on how to handle the different data.

0 Likes
830 Views
momin_tahemas
Contributor

Thanks for Sharing

0 Likes
830 Views
robert99
Valued Contributor II

Hi Henric (or anyone who can answer this)

Would it be possible to make this possible in set analysis. To have something like P or E (say D) where set analysis can be linked to the dimension?

I have a situation with debtors where there are a lot of credits that i want to take out. I can easily do this with the closing debtor position (or any Month). Like so

sum({$<CustAccNum = {"=sum(ARValue) >0"}>}ARValue)

where I select one AsOfMonth

But it doesn't work with AsOfMonth as a dimension.

So with only one AOM selected the total are correct (option 1 and 2)

Option 1 >>>  sum({$<CustAccNum = {"=sum(ARValue) >0"}>}ARValue)

Option 2 >>>> sum({$<CustAccNum = {"=sum(if(AsOfMonth = AsOfMonth ,ARValue)) >0"}>}ARValue)

Option 3 >>>   sum({$<CustAccNum = {"=sum({1}if(AsOfMonth = AsOfMonth ,ARValue)) >0"}>}ARValue)

But when the AsOfMonth May 2018 selection is excluded (the correct figure is £3,366,025). So option 2 (and 3) is close but there is always a smallish difference

I tried everything I could think of to get the correct figure as above. Including various options in script. Maybe there is some way to do this but it would be so easy if the set analysis could be linked to the dimension

Something like this

sum({$<CustAccNum = {"=sum({$<AsOfMonth = D(AsOfMonth) >} ARValue)>0"}>}ARValue)

Thanks.

0 Likes
830 Views