Qlik Community

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Excluding values in Set Analysis

The Set Analysis is a commonly used tool when creating static or dynamic filters inside aggregations in Qlik Sense or QlikView. But sometimes you need the opposite to selecting – you need to exclude values. How can this be done?

In Set Analysis, it is straightforward to make selections; to define the criteria for inclusion. But it is not as straightforward to define an exclusion. But there are in fact several ways to do this.

First of all, an inclusion can be defined the following way:

Formula0.png

This expression is equivalent to saying “Sum the Amounts where Field equals X”.

But if you want to say the opposite – “where field does not equal X” – it becomes more complicated. The relation “not equal to” is not a Set operation. However, one way to do this is to use the implicit exclusion operator:

Formula1.png

Note the minus sign in front of the equals sign. This will create an element set based on the existing selected values, but with the value X removed.

A second way to do this is to use the unary exclusion operator:

Formula2.png

This will return the complement set of X and use this as element set in the set expression.

In many situations the two methods return identical sets. But there are cases when they are different. In the table below you can see that it makes a difference if there already is a selection in the field.

Image5.png

Note the difference on the row with the C. Since the implicit operator (the 2nd expression) is based on current selection, also the value C is excluded from the calculation. This is in contrast to the unary operator (the 3rd expression) that creates a completely new element set, not based on current selection.

We can also see that both of the above expressions exclude records where the field is NULL (the line where Field has a dash only). The reason is simple: As soon as there is a selection in the field, the logical inference will exclude NULL in the same field.

So what should you do if you want to exclude X but not NULL?

The answer is simple: Use another field for your selection. Typically you should use the primary key for the table where you find the Amount.

Formula3.png

Here you need to use the element function E(), that returns excluded values. Hence, the above Set expression says: “Select the IDs that get excluded when selecting X.”

Image7.png

The table confirms that we get the records we want: X is excluded but NULL is still included.

With this, I hope that you understand Set Analysis somewhat better.

HIC

Further reading related to this topic:

A Primer on Set Analysis

Why is it called Set Analysis?

Finding NULL

38 Comments
Not applicable

As usual helpful

7,228 Views
upaliwije
Contributor II

Nice Explanation

0 Likes
7,228 Views
Arjunarao
Honored Contributor II

Beautiful explanation. Thank you for the such a blog post.

0 Likes
7,228 Views
Partner
Partner

Thanks for the content.

When i Try the same in Qlik Sense, below is what i am getting in Column 3.

So any idea why is this different?Exclude.PNG

0 Likes
7,228 Views

There is no difference in Set Analysis between QlikView and Qlik Sense.

Instead, what you have is not a NULL. You have a blank, and as such it is selectable, You can see this on the fact that it appears on its own row in the filter box.

See more on NULL handling in QlikView

HIC

7,228 Views
Partner
Partner

Understood.. Thank you for the clarification

0 Likes
7,228 Views
SreeniJD
Valued Contributor

Hi @HIC

Very helpful.

0 Likes
7,228 Views
cleblois
New Contributor III

 

Hi,

I have one for you  

We want to analyse our products with a focus on joiners and leavers.

I found the beginning of it on the forum but now I'm stuck.

 

In below example, leavers for 2012 and joiners for 2013 are correct because I fixed the values in the formula.

My issue is to make it generic for this table in Sense.

 

current formulas:

* joiners: Count( DISTINCT {$ < DPN= P({<Year_QVW={'2013'}>} DPN)- P({<Year_QVW={'2012'} >} DPN)>} DPN)

* Leavers: Count( DISTINCT {$ < DPN= P({<Year_QVW={'2012'}>} DPN)- P({<Year_QVW={'2013'}>} DPN)>} DPN)

 

When I use max in the P(), it considers 2015, the max year of my selection, not the year of my column.

any idea to repalce my fixed years by years relative to my columns?

 

 

Thanks in advance,

 

Christophe

    

0 Likes
7,228 Views
Partner
Partner

An alternative when you need exclude X but include those with Null is to combine two expressions;

     Sum({$<Field=>} Amount) - Sum({$<Field={"X"} Amount)

(everything including both X & Null minus the sum with only X)

If you do not have another field suitable to use with the E() function this will solve the problem.

0 Likes
7,228 Views
ecolomer
Honored Contributor II

very helpful

thank's for sharing

0 Likes
7,228 Views
MVP & Luminary
MVP & Luminary

I have to add a comment to this one...

One of the most useful Set operators, I feel, is one that tends to be publicised the least:

=sum({<Region*={'South'}>}Amount)

This gives you the Amount, for the South, as long as it is included in the Current Selections.  If you select North in the Region field, the 'default' expression of:

=sum({<Region={'South'}>}Amount)

will still blindly give the value for the South in that column, totally disregarding (and confusing) the user.

Now, this forcing of a selection in a set is absolutely essential for prior period comparisons etc., but there are lots of cases where it does not provide intuitive output.

So, now to my point...  why, why, why does the syntax highlighter blow up when you use *= in a Set statement?  Is there something I am doing wrong in my expression, or is this a glitch that has been overlooked for many versions?

Thanks, as ever, for the insights hic‌ .

Steve

7,228 Views
Not applicable

I think you can also use this syntax Steve (this is how I achieve the same behaviour, and I think the syntax highlighter is ok with it):

=sum({$*<Region={'South'}>}Amount)

7,228 Views
Partner
Partner

When it comes to set analysis the syntax highlighter is really bad.

Hopes are it will get a lot better in version 12.

7,228 Views
MVP & Luminary
MVP & Luminary

Yes Graeme - but this doesn't give you the granularity of applying to one field and not another?

7,228 Views
MVP & Luminary
MVP & Luminary

Indeed Anders, there are a number of ways to break it - as soon as you add a variable for one.

Looking at the syntax highlighting in Sense will give an indication - as it is the Sense engine that is going to be behind QlikView 12, and therefore potentially the syntax highlighting algorithms.

0 Likes
7,228 Views
mayankraoka
Valued Contributor

Good explanation.....

0 Likes
7,228 Views
Not applicable

stevedark Oh... that was missing from the requirements doc/examples! 


Agreed though - *= gives you more granularity within construction of the sets than $*<> (at the cost of the syntax highlighting).

7,228 Views
Partner
Partner

Steve:

Have you tried in 11.20 SR12? I'm not seeing any obvious issue with the syntax highlighter... Can you send a screen of the issue?

Graeme:

Just a quick tip on performance:

sum({$*<Region={'South'}>}Amount)

vs

sum({$<Region*={'South'}>}Amount)


Even if the results are the same in this simple syntax implementation, I would not use the $*< syntax as it is less descriptive of the intersect you are after - I.e. you've given the engine less information to ensure the intersect is identified in the most optimized way possible. In the first instance you're saying "intersect my current selections with another selection defined by all of my current selections where Region has been overridden to 'South'." in the second example you're saying "Relative to my current selection, intersect my Regions selection with 'South'". I don't know the exact way in which the selection engine carries out each of these operations as it's very much a 'black box' still but an educated guess would be that the first syntax will cost more to resolve for the selection engine.


HIC:

1. Any corrections on my educated guesses above on selection engine computational cost?

2. Idea: How about some transparency into computation algorithms in a future version of QV/QS by EXPLAIN PLAN like information breaking out estimated effort by the different computational steps involved in resolving an expression? Not a mainstream requirement, I know, but I think you'd get a major thumbs up from the seasoned developer community for such a feature.


Regards

Jonas

0 Likes
7,228 Views
MVP & Luminary
MVP & Luminary

In addition to what Steve said it also refers to the default set. When you use alternate states in your document the default set for an object is not necessarily the same as the inherited set.

0 Likes
7,228 Views
Not applicable

jonasvalleskog

Hi Jonas,

To be honest, I'd never really had reason to give it that much thought - but I 100% agree with your reasoning and logic.  I knocked up a quick sample app, and the calc times (I know they aren't perfect) do appear to support your theory (although they are a little erratic).  I will try it in some real world examples when I get a chance and see if there is a noticeable improvement.  Thanks a lot for sharing your insight.  Would be really interested to hear from HIC regarding the internal mechanics of the set execution inside QV, and I really love the idea about the execution/explain plan for expressions!

Regards,

Graeme

0 Likes
7,228 Views
Partner
Partner

This would be particularly useful in computing Customer Retention and Customer Movement, New Customers, Lost Customers where the metric takes a different meaning depending on the level of aggregation.

A customer could be retained at a global level but Lost at a Regional level and New in another Region.

Classic A-B and B-A and A*B case.

0 Likes
7,228 Views

On the performance:

  • Set operators on the record set (e.g. {1-$} and case one in Jonas' examples) operate on the data tables - tables that often are very large.
  • Set operators on the element set (e.g. {$<F={X}+{Y}>} and case two in Jonas' examples) operate on the symbol tables - tables that usually are fairly small.

Conclusion: Jonas' educated guess is right.

Something similar to an EXPLAIN PLAN would be very complex, so I doubt that we will ever do it. A selection will cause many things to happen and it is not a chain of events... Example:

  1. User makes selection
  2. This triggers the logical inference to calculate new state vectors
  3. The resulting new state vectors trigger the calculation engine to re-calculate all visible objects and all variables (multiple next steps)
  4. A single object can have several formulas (multiple next steps)
  5. A single formula can have several aggregations (multiple next steps)
  6. An aggregation can have Set Analysis
  7. The Set Analysis triggers the logical inference to calculate new state vectors for this Set Analysis (compare to step 2)
  8. New state vectors triggers the calculation engine to calculate all aggregations inside the Set Analysis expression (compare to step 3)
  9. ... etc...

Note that several of the steps branch out to several "daughters", each often in its own thread.

Of course something like EXPLAIN PLAN could in theory be done, but I question the value, given that it is extremely hard to estimate the cost of each step. And would people understand it...?

HIC

7,228 Views
qlikviewnovice
Valued Contributor II

Nice Explanation!!

0 Likes
7,228 Views
MVP
MVP

Very nice. Thanks.

0 Likes
7,228 Views
Partner
Partner

Set analysis deals with the datamodel before the expression is evaluated in your table.

There is no way the set analysis can be aware of and deal with your dimension values.

0 Likes
7,228 Views
Not applicable

Thanks for the extra information hic 

I tried a few real world examples of switching instances of $*<> with *=, and didn't really notice any change in performance on a few million rows (although those apps are complicated, so difficult to isolate the impact). 

I also put together a test doc (just spitting out 10m rows using the default test script in QV script editor CTRL+Q), and added some expressions as per below and didn't really see any consistent performance differences.  Nothing scientific, but each of the straight tables (one expression on each) were consistently similar in terms of calculation times and memory showing in the Sheet Properties > Objects (used new qv.exe instances each time to clear the cache).

Sum({$*<Dim1={'A'}>} Expression1 * Expression2 * Expression3)

Sum({<Dim1*={'A'}>} Expression3 * Expression2 * Expression1)

Sum({$*<Dim1*={'A'}>} Expression3 * Expression2 * Expression1)

The *= syntax also offers more granularity as discussed above (thanks stevedark‌, jonasvalleskog‌), but from a performance perspective I'm not seeing a huge difference.  Perhaps I'm missing something, or perhaps the benefits of one approach over the other in terms of performance only kicks in when you move into the 10's of millions of rows.  Still useful to know about and bear in mind when designing. 

Thanks all for the useful input.

Regards,

Graeme

0 Likes
7,228 Views

I wouldn't expect any huge differences, even though the record set operation necessarily must involve a larger number of executed instructions. So your investigation is probably correct, which means that performance probably is a weak argument for choosing one or the other.

HIC

0 Likes
7,228 Views
Partner
Partner

Evening HIC,

I can see that an EXPLAIN PLAN like feature would produce pages of information as per your comment. I wouldn't be deterred by that though. the community can innovate on methods of parsing, abstracting and generally making sense of such output. Anyone who has a SQL tuning and/or DBA background would be familiar with the concept of spotting the poorly performing query and differentiating between algorithm choices. Without low level insight into computation cost - we're hamstrung as to how to go about tuning expressions, models and UI designs. Knowledge = power.

I for one would benefit from such tools . 1 vote from me and 1 from Graeme... Maybe I'll start a petition!

0 Likes
7,228 Views
MVP
MVP

1 from me too.

0 Likes
7,228 Views
Luminary
Luminary

Hi HIC,

Another great post !

0 Likes
7,228 Views