Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
mbespartochnyy
Creator III
Creator III

Why is Set Analysis on Key Fields Produces Unexpected Results?

I ran into a VERY strange behavior in Qlik Sense set analysis. I'm hoping someone here can explain why set analysis is producing different/unexpected results on identical data set.

The app that demonstrates the issue is attached.

 

Context

I have one table that looks like this:

mbespartochnyy_0-1732124810474.png

Here's the script for that table:

 

 

table_temp:
Load *
Inline [
id|name|name_qa_results|super_qa_results
1|name_1|Match|Match
2|name_2|Match|Match
3|name_3|Match|Match
4|name_4|Match|Match
5|name_5|Match|Match
6|name_6|Match|Match
7|name_7
8|name_8
9|name_9|Match|Match
10|name_10|Match|Match
11|name_11|Match|Match
12|name_12
13|name_13
14|name_14
15|name_15
16|name_16|Match|Match
](delimiter is '|');

table:
NoConcatenate
Load id
	, If(Len(Trim(name_qa_results)) = 0, Null(), name_qa_results) as name_qa_results
    , If(Len(Trim(super_qa_results)) = 0, Null(), super_qa_results) as super_qa_results
Resident table_temp;

Drop Table table_temp;

 

 

 

Then I have an expression in a text object that counts IDs that have a 'Match' value in super_qa_results field or an id number 7, like this:

 

 

Count({<super_qa_results = {'Match'}> + <id = {7}>} distinct id)

 

 

This expression produces accurate count which is 11:

mbespartochnyy_1-1732125191813.png

 

The Problem

However, if I create an identical table that has two other tables associated with it and one of the associated tables has only one of all possible IDs, then the same expression produces the wrong count.

Here's what the data model for the second table looks like:

mbespartochnyy_2-1732125501766.png

and here's the script to generate the second table along with the linked tables:

 

 

table_temp_2:
Load *
Inline [
id_2|name_2|name_qa_results_2|super_qa_results_2
1|name_1|Match|Match
2|name_2|Match|Match
3|name_3|Match|Match
4|name_4|Match|Match
5|name_5|Match|Match
6|name_6|Match|Match
7|name_7
8|name_8
9|name_9|Match|Match
10|name_10|Match|Match
11|name_11|Match|Match
12|name_12
13|name_13
14|name_14
15|name_15
16|name_16|Match|Match
](delimiter is '|');

table_2:
NoConcatenate
Load id_2
	, If(Len(Trim(name_qa_results_2)) = 0, Null(), name_qa_results_2) as name_qa_results_2
    , If(Len(Trim(super_qa_results_2)) = 0, Null(), super_qa_results_2) as super_qa_results_2
Resident table_temp_2;

link:
Load Distinct id_2
	, name_2
Resident table_temp_2
;

Drop Table table_temp_2;

dbOwner:
Load *
Inline [
id_2|db
16|db_1
16|db_2
16|db_3
16|db_4
](delimiter is '|');

 

 

 

and here's the screenshot of same text object with same expression (except it referencing fields from the second table, of course) generating wrong results:

mbespartochnyy_3-1732125633400.png

Both table and table_2 have identical data, yet when I add two other tables to be associated with table_2, the set analysis starts to produce wrong results.

Any idea why set analysis is producing different results in this example?

Labels (3)
18 Replies
henrikalmen
Specialist II
Specialist II

You can get the expected result if you sum two separate calculations:

Count({<super_qa_results_2 = {'Match'}>} distinct id_2) + Count({<id_2 = {7}>} distinct id_2)

But what if you would count <id_2 = {7,9}>? In your example with the single table, you would still get 11. In my modified example, you will get 12 as the result. Obviously because the distinct ID 9 is counted twice with my expression.

I'm kind of wondering why your set expression returns 11 in your first example. I see that it does, but I'm confused about why. Your set expression selects 'Match' in the super_qa_results_2 field, and then it adds a selection in the id field. If you would manually perform the first selection in your app, the id 7 is not available anymore (in a table it disappears, in a filter box it becomes greyed out). 

I don't have an answer as to why the expression behaves the way it does in your example.

Or
MVP
MVP

Honestly, this seems kind of unusual to me. Maybe @hic can shed some light here if he happens to be interested?

 

henrikalmen
Specialist II
Specialist II

If the table dbOwner is removed, the expression gives the same result as in the first example where you had only one table. So I figure this issue has something to do with the fact that the value 7 in the field id_2 doesn't exist in that table, but it exists in the other two tables.

But if you instead add id_2=7 to the dbOwner table, the result of the expression is: 2. That's kind of confusing as well.

Muthukumar_77
Contributor III
Contributor III

Try like this,

table_temp_2:
Load *
Inline [
id_2|name_2|name_qa_results_2|super_qa_results_2
1|name_1|Match|Match
2|name_2|Match|Match
3|name_3|Match|Match
4|name_4|Match|Match
5|name_5|Match|Match
6|name_6|Match|Match
7|name_7
8|name_8
9|name_9|Match|Match
10|name_10|Match|Match
11|name_11|Match|Match
12|name_12
13|name_13
14|name_14
15|name_15
16|name_16|Match|Match
](delimiter is '|');
 
table_2:
NoConcatenate
Load id_2 as id_2_Key
,id_2
,If(Len(Trim(name_qa_results_2)) = 0, Null(), name_qa_results_2) as name_qa_results_2
    , If(Len(Trim(super_qa_results_2)) = 0, Null(), super_qa_results_2) as super_qa_results_2
Resident table_temp_2;
 
link:
Load Distinct id_2 as id_2_Key
, name_2
Resident table_temp_2
;
 
Drop Table table_temp_2;
 
dbOwner:
Load id_2 as id_2_Key,db
Inline [
id_2|db
16|db_1
16|db_2
16|db_3
16|db_4
](delimiter is '|');

 

Test1.png

 

Test2.png

 

Thanks Regards,
Muthukumar P
Qlik Developer
lennart_mo
Creator II
Creator II

Hi @mbespartochnyy,

I'm confused about this happening as well.

From testing I got that the dbOwner table seems to cause your problem, explicitly the limited dataset for id. A workaround I found to work was populating the dbOwner table with the missing IDs. To do so I used this script:

dbOwner:
Load DISTINCT id_2
Resident table_2;

Left Join(dbOwner)
Load *
Inline [
id_2|db
16|db_1
16|db_2
16|db_3
16|db_4
](delimiter is '|');

 

mbespartochnyy
Creator III
Creator III
Author

Thanks for input everyone! I appreciate the ideas.

For additional context, what the set analysis in my example is meant to accomplish is an OR type of logic. More specifically, I need the expression to count number of IDs if a QA field (super_qa_results_2 in the post example) or an additional field equals to some specific value. So, number of IDs where super_qa_results_2 equals 'Match' or id_2 equals 7.

This OR logic seems to break, or at least produce unexpected results, when a dimension table with incomplete list of IDs is introduced.

 

Workarounds

I tried what @Muthukumar_77 and @lennart_mo suggested and can confirm that those workarounds worked. Thanks again for sharing your thoughts and suggestions on how to work around this issue!

Ultimately, I ended up moving the OR logic out of the set analysis and into the script by using the IF statement, something like this:

If(super_qa_results_2 = 'Match' or id_2 = 7, 1, 0) as audit_check

 

Then I rewrote the expression in the KPI object to be:

Count({<audit_flag = {1}>} distinct id_2)

 

And that does produce accurate count:

mbespartochnyy_0-1732198325631.png

But it would be nice if the OR logic was working fine within set analysis. Or at least if there was an explanation as to why set analysis produces unexpected results when an OR type of operator is used in a data model that contains a dimension table with incomplete key field values.

In my experience a dimension table with an incomplete set of (foreign) key values is a normal occurrence. I wouldn't expect there to be a need for every dimension table to have a complete list of key field values to have the OR logic in set analysis working.

P.S. If anyone will figure out why the set analysis behaves the way it does when a dimension table with incomplete key field values is introduced, would you please share your findings? I would love to get a better understanding of what's going on within the set analysis in this scenario.

P.P.S. Updated app with the workaround in place is attached.

marcus_sommer

By using a well-known data-set in a properly designed star-scheme data-model (it's officially recommended) it's rather unlikely to get in trouble by calculating with key-fields. In all other scenarios don't use key-fields in measures, selections and dimensions - they are just fields to associate tables. In the earlier days this was also documented in the help. 

marcus_sommer

Set analysis is more or less a selection and is evaluated on a column-level - each one for it's own and then the n ones are combined in the specified order and against the object-dimensionality in which they are applied.

By using if-loops the evaluation is performed on a row-level which is quite different, for example in regard to query against NULL which isn't possible on a column-level.

henrikalmen
Specialist II
Specialist II

Qlik Sense Enterprise on Windows version November 2024 has just been released, and it mentions a bug fix that sounds like the issue you encountered:

henrikalmen_0-1732620503107.png