Qlik Community

QlikView Documents

Documents for QlikView related information.

The exists issue

giakoum
Honored Contributor II

The exists issue

The exists issue

Summary : The exists function works as expected ONLY if the field it is applied to is a unique key. If it is not, only the first occurrence of the key is taken into account by the function.

Let me quickly give an example, before you start disagreeing . This is an issue I always fall for, so maybe this post will prove useful to other developers as well.

Let’s suppose that we have table test1 as follows :

test1:
LOAD * INLINE [
F1, F2
a, 21
b, 422
b, 412
c, 424
c, 48
c, 42
d, 24
e, 56
]
;

and we need to append table test2, but only the records where F1 does not exist already. An obvious way would be as follows (field names are the same in both tables, so auto concatenate will occur) :

test2:
LOAD * INLINE [
F1, F2
a, 210
a, 212
a, 223
b, 422
f, 424
h, 24
h, 242
g, 561
g, 562
g, 563
g, 564
g, 565
]

Where
not Exists(F1);

The expected result would be a table containing all rows from test1 and all rows from test2 where F1 does not exist :

F1, F2
a, 21
b, 422
b, 412
c, 424
c, 48
c, 42
d, 24
e, 56
f, 424
h, 24
h, 242
g, 561
g, 562
g, 563
g, 564
g, 565
]

Well, it is not. F1 is not a unique key. We have several rows with the same value for F1 in the second table (same in the first table, but exists is applied to the second one, so the first is not affected). So what QlikView does is, it adds the first non-existing F1 value it encounters, for example row (h, 24). The next row, (h, 242) is NOT added as QlikView now considers value h to exist. The exists function even checks the current load for F1, not only the data loaded so far by previous reload statements.
So the final result is 5 rows less than expected :

F1, F2
a, 21
b, 422
b, 412
c, 424
c, 48
c, 42
d, 24
e, 56
f, 424
h, 24
g, 561

Please see attached qvw to verify the result.

This issue is with us like since always, so obviously this is not a bug, it is a feature . So extra caution is needed when applying exists. A common case where this may cause unexpected behavior is in incremental reloads.

An easy workaround would be to create a 3rd field, F1 as F1_lookup, and then apply the exists function as follows :

Where
not Exists(F1_lookup, F1)

The disadvantage is that when loading from qvd files, the above syntax is not an optimized qvd load anymore. It is still faster than a resident load, but not as fast as an optimized one.

In the help, there is no mentioning of this, except of the last sentence (“This is equivalent to performing a distinct load on field A.”), which is however not clear or emphasized :

exists(field [ , expression ] )

Determines whether a specific field value exists in a specified field of the data loaded so far. Field is a name or a string expression evaluating to a field name. The field must exist in the data loaded so far by the script. Expr is an expression evaluating to the field value to look for in the specified field. If omitted, the current record’s value in the specified field will be assumed.

Examples:

exists(Month, 'Jan') returns -1 (true) if the field value 'Jan' is found in the current content of the field Month.

exists(IDnr, IDnr) returns -1 (true) if the value of the field IDnr in the current record already exists in any previously read record containing that field.

exists (IDnr) is identical to the previous example.

Load Employee, ID, Salary from Employees.csv;
Load FirstName& ' ' &LastName as Employee, Comment from Citizens.csv where exists (Employee, FirstName& ' ' &LastName);
Only comments regarding those citizens who are employees are read.

Load A, B, C, from Employees.csv where not exists (A);
This is equivalent to performing a distinct load on field A.

Thank you for your time.
BR, Ioannis

Tags (2)
Labels (2)
Attachments
Comments
richard_pearce6
Valued Contributor

Thanks for this. I think a lot of people are unaware this takes place.

Richard

jaimeaguilar
Valued Contributor II

Great post and I think many people (including me) are not aware of this issue. An alternative for this issue may be using keep function, which I think is a very underrated function regarding reduction of data,

regards

michael_gardner
Contributor III

Thanks.  QlikView can have some unexpected behavior that can lead to a really bad day. 

Another case is Distinct behavior. 

DISTINCT can be deceiving - The Qlik Fix! The Qlik Fix!

IAMDV
Honored Contributor II

Ioannis - Thank you for the pointer. Very valuable and I've seen this behavior in the past. I think it's "functionality as is" compared to a bug. However, I agree it's important to be aware. As a standard practice, I always use/create the primary key to use the Exists function. If it's not available then KEEP is better choice.


Cheers,

DV

www.QlikShare.com

Gabriel
Valued Contributor II

Nice post.

The EXISTS function from SQL background, is used in combination with a subquery, subquery in this case F1,

and is considered to be met if the subquery returns at least one row. However the workaround you suggested, creating 3rd field is good.

Thanks

simondachstr
Valued Contributor III

This cost me about 2hrs of debugging the other day until I figured out renaming the original field helps but I felt bad I didn't understood why it's behaving as it was. Thanks for the clarification Ioannis Giakoumakis.

giakoum
Honored Contributor II

Thank you all for your comments.

A primary key is definitely the best way to go. Keep is is also a good alternative wherever possible. Thank you Jaime and Deepak for mentioning.

juraj_misina
Valued Contributor

While I think this works as designed, it is always usefull to remind specifics of QlikView functions. Thank you for that.

benrig44
New Contributor III

Agree, the KEEP keyword is underutilized. I've found it to be much faster than alternative JOIN. I dont know why but it is

IAMDV
Honored Contributor II

Fully agree Steven. KEEP is under utilized by QV developers. It's because most of the QV developers don't come from traditional DB/SQL background. I'm sure most of the SQL developers will embrace KEEP as much as they do the JOIN(s).

Version history
Revision #:
1 of 1
Last update:
‎08-25-2014 04:36 AM
Updated by: