Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
hic
Former Employee
Former Employee

The Set Analysis is a commonly used tool when creating static or dynamic filters inside aggregations in Qlik Sense or QlikView. But sometimes you need the opposite to selecting – you need to exclude values. How can this be done?

In Set Analysis, it is straightforward to make selections; to define the criteria for inclusion. But it is not as straightforward to define an exclusion. But there are in fact several ways to do this.

 

First of all, an inclusion can be defined the following way:

 

Formula0.png

 

This expression is equivalent to saying “Sum the Amounts where Field equals X”.

 

But if you want to say the opposite – “where field does not equal X” – it becomes more complicated. The relation “not equal to” is not a Set operation. However, one way to do this is to use the implicit exclusion operator:

 

Formula1.png

 

Note the minus sign in front of the equals sign. This will create an element set based on the existing selected values, but with the value X removed.

 

A second way to do this is to use the unary exclusion operator:

 

Formula2.png

 

This will return the complement set of X and use this as element set in the set expression.

 

In many situations the two methods return identical sets. But there are cases when they are different. In the table below you can see that it makes a difference if there already is a selection in the field.

 

Image5.png

 

Note the difference on the row with the C. Since the implicit operator (the 2nd expression) is based on current selection, also the value C is excluded from the calculation. This is in contrast to the unary operator (the 3rd expression) that creates a completely new element set, not based on current selection.

 

We can also see that both of the above expressions exclude records where the field is NULL (the line where Field has a dash only). The reason is simple: As soon as there is a selection in the field, the logical inference will exclude NULL in the same field.

 

So what should you do if you want to exclude X but not NULL?

 

The answer is simple: Use another field for your selection. Typically you should use the primary key for the table where you find the Amount.

 

Formula3.png

 

Here you need to use the element function E(), that returns excluded values. Hence, the above Set expression says: “Select the IDs that get excluded when selecting X.”

 

Image7.png

 

The table confirms that we get the records we want: X is excluded but NULL is still included.

 

With this, I hope that you understand Set Analysis somewhat better.

 

HIC

 

Further reading related to this topic:

A Primer on Set Analysis

Why is it called Set Analysis?

Finding NULL

40 Comments
avastani
Partner - Creator III
Partner - Creator III

This would be particularly useful in computing Customer Retention and Customer Movement, New Customers, Lost Customers where the metric takes a different meaning depending on the level of aggregation.

A customer could be retained at a global level but Lost at a Regional level and New in another Region.

Classic A-B and B-A and A*B case.

0 Likes
13,365 Views
hic
Former Employee
Former Employee

On the performance:

  • Set operators on the record set (e.g. {1-$} and case one in Jonas' examples) operate on the data tables - tables that often are very large.
  • Set operators on the element set (e.g. {$<F={X}+{Y}>} and case two in Jonas' examples) operate on the symbol tables - tables that usually are fairly small.

Conclusion: Jonas' educated guess is right.

Something similar to an EXPLAIN PLAN would be very complex, so I doubt that we will ever do it. A selection will cause many things to happen and it is not a chain of events... Example:

  1. User makes selection
  2. This triggers the logical inference to calculate new state vectors
  3. The resulting new state vectors trigger the calculation engine to re-calculate all visible objects and all variables (multiple next steps)
  4. A single object can have several formulas (multiple next steps)
  5. A single formula can have several aggregations (multiple next steps)
  6. An aggregation can have Set Analysis
  7. The Set Analysis triggers the logical inference to calculate new state vectors for this Set Analysis (compare to step 2)
  8. New state vectors triggers the calculation engine to calculate all aggregations inside the Set Analysis expression (compare to step 3)
  9. ... etc...

Note that several of the steps branch out to several "daughters", each often in its own thread.

Of course something like EXPLAIN PLAN could in theory be done, but I question the value, given that it is extremely hard to estimate the cost of each step. And would people understand it...?

HIC

13,365 Views
Anonymous
Not applicable

Nice Explanation!!

0 Likes
13,365 Views
afurtado
Partner Ambassador/MVP
Partner Ambassador/MVP

Very nice. Thanks.

0 Likes
13,365 Views
anderseriksson
Partner - Specialist
Partner - Specialist

Set analysis deals with the datamodel before the expression is evaluated in your table.

There is no way the set analysis can be aware of and deal with your dimension values.

0 Likes
13,562 Views
Not applicable

Thanks for the extra information hic 

I tried a few real world examples of switching instances of $*<> with *=, and didn't really notice any change in performance on a few million rows (although those apps are complicated, so difficult to isolate the impact). 

I also put together a test doc (just spitting out 10m rows using the default test script in QV script editor CTRL+Q), and added some expressions as per below and didn't really see any consistent performance differences.  Nothing scientific, but each of the straight tables (one expression on each) were consistently similar in terms of calculation times and memory showing in the Sheet Properties > Objects (used new qv.exe instances each time to clear the cache).

Sum({$*<Dim1={'A'}>} Expression1 * Expression2 * Expression3)

Sum({<Dim1*={'A'}>} Expression3 * Expression2 * Expression1)

Sum({$*<Dim1*={'A'}>} Expression3 * Expression2 * Expression1)

The *= syntax also offers more granularity as discussed above (thanks stevedark‌, jonasvalleskog‌), but from a performance perspective I'm not seeing a huge difference.  Perhaps I'm missing something, or perhaps the benefits of one approach over the other in terms of performance only kicks in when you move into the 10's of millions of rows.  Still useful to know about and bear in mind when designing. 

Thanks all for the useful input.

Regards,

Graeme

0 Likes
13,562 Views
hic
Former Employee
Former Employee

I wouldn't expect any huge differences, even though the record set operation necessarily must involve a larger number of executed instructions. So your investigation is probably correct, which means that performance probably is a weak argument for choosing one or the other.

HIC

0 Likes
13,562 Views
JonasValleskog
Partner - Creator
Partner - Creator

Evening HIC,

I can see that an EXPLAIN PLAN like feature would produce pages of information as per your comment. I wouldn't be deterred by that though. the community can innovate on methods of parsing, abstracting and generally making sense of such output. Anyone who has a SQL tuning and/or DBA background would be familiar with the concept of spotting the poorly performing query and differentiating between algorithm choices. Without low level insight into computation cost - we're hamstrung as to how to go about tuning expressions, models and UI designs. Knowledge = power.

I for one would benefit from such tools . 1 vote from me and 1 from Graeme... Maybe I'll start a petition!

0 Likes
13,562 Views
afurtado
Partner Ambassador/MVP
Partner Ambassador/MVP

1 from me too.

0 Likes
13,562 Views
pablolabbe
Luminary Alumni
Luminary Alumni

Hi HIC,

Another great post !

0 Likes
13,562 Views