The exists issue

    The exists issue

    Summary : The exists function works as expected ONLY if the field it is applied to is a unique key. If it is not, only the first occurrence of the key is taken into account by the function.

    Let me quickly give an example, before you start disagreeing . This is an issue I always fall for, so maybe this post will prove useful to other developers as well.

    Let’s suppose that we have table test1 as follows :

    test1:
    LOAD * INLINE [
    F1, F2
    a, 21
    b, 422
    b, 412
    c, 424
    c, 48
    c, 42
    d, 24
    e, 56
    ]
    ;

    and we need to append table test2, but only the records where F1 does not exist already. An obvious way would be as follows (field names are the same in both tables, so auto concatenate will occur) :

    test2:
    LOAD * INLINE [
    F1, F2
    a, 210
    a, 212
    a, 223
    b, 422
    f, 424
    h, 24
    h, 242
    g, 561
    g, 562
    g, 563
    g, 564
    g, 565
    ]

    Where
    not Exists(F1);

    The expected result would be a table containing all rows from test1 and all rows from test2 where F1 does not exist :

    F1, F2
    a, 21
    b, 422
    b, 412
    c, 424
    c, 48
    c, 42
    d, 24
    e, 56
    f, 424
    h, 24
    h, 242
    g, 561
    g, 562
    g, 563
    g, 564
    g, 565
    ]

    Well, it is not. F1 is not a unique key. We have several rows with the same value for F1 in the second table (same in the first table, but exists is applied to the second one, so the first is not affected). So what QlikView does is, it adds the first non-existing F1 value it encounters, for example row (h, 24). The next row, (h, 242) is NOT added as QlikView now considers value h to exist. The exists function even checks the current load for F1, not only the data loaded so far by previous reload statements.
    So the final result is 5 rows less than expected :

    F1, F2
    a, 21
    b, 422
    b, 412
    c, 424
    c, 48
    c, 42
    d, 24
    e, 56
    f, 424
    h, 24
    g, 561

    Please see attached qvw to verify the result.

    This issue is with us like since always, so obviously this is not a bug, it is a feature . So extra caution is needed when applying exists. A common case where this may cause unexpected behavior is in incremental reloads.

    An easy workaround would be to create a 3rd field, F1 as F1_lookup, and then apply the exists function as follows :

    Where
    not Exists(F1_lookup, F1)

    The disadvantage is that when loading from qvd files, the above syntax is not an optimized qvd load anymore. It is still faster than a resident load, but not as fast as an optimized one.

    In the help, there is no mentioning of this, except of the last sentence (“This is equivalent to performing a distinct load on field A.”), which is however not clear or emphasized :

    exists(field [ , expression ] )

    Determines whether a specific field value exists in a specified field of the data loaded so far. Field is a name or a string expression evaluating to a field name. The field must exist in the data loaded so far by the script. Expr is an expression evaluating to the field value to look for in the specified field. If omitted, the current record’s value in the specified field will be assumed.

    Examples:

    exists(Month, 'Jan') returns -1 (true) if the field value 'Jan' is found in the current content of the field Month.

    exists(IDnr, IDnr) returns -1 (true) if the value of the field IDnr in the current record already exists in any previously read record containing that field.

    exists (IDnr) is identical to the previous example.

    Load Employee, ID, Salary from Employees.csv;
    Load FirstName& ' ' &LastName as Employee, Comment from Citizens.csv where exists (Employee, FirstName& ' ' &LastName);
    Only comments regarding those citizens who are employees are read.

    Load A, B, C, from Employees.csv where not exists (A);
    This is equivalent to performing a distinct load on field A.

    Thank you for your time.
    BR, Ioannis