Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
chiru_thota
Specialist
Specialist

Left keep Versus Where exists ( Which is good in terms of resource usage & execution time)

Hi All,

Can anyone help me out by explaining about the pros and cons of left keep and where exists.

(1) Which one better in terms of in terms of resource usage & execution time ?

Thanks,

Chiru.

1 Solution

Accepted Solutions
hic
Former Employee
Former Employee

My experience is that "Left Keep" is much faster. But a "Where Exists()" is much more versatile - you can do more with it.

HIC

View solution in original post

4 Replies
hic
Former Employee
Former Employee

My experience is that "Left Keep" is much faster. But a "Where Exists()" is much more versatile - you can do more with it.

HIC

mmarchese
Creator II
Creator II

@hic 

Could you explain why left keep is much faster than where exists?

And what about the case of loading a qvd?  Will it be an optimized load in either case, and will left keep still outperform?

hic
Former Employee
Former Employee

Left keep is a binary operation after the table has been loaded: The bitvector(s) of the key field(s) are compared, and then the extra records are purged. There are no calculations really.

In the "Where Exists()" case, you instead have a for-next loop where you for each record need to make a lookup in a table, and this takes time.

mmarchese
Creator II
Creator II

@hic Thanks.  Follow up:

1) I experimented a little and verified that there is definitely a case where keep is much slower: when you are only keeping a small subset of the original data.  This makes sense given your statement that with keep, Qlik loads the entire table first and purges the unnecessary rows second.  I suppose the difference here is that exists decides whether to keep each row as it first comes into contact with it, as opposed to bothering to find a semi-permanent home for it in memory first?

2) I'm having a hard time understanding how keep could function without looping or calculations, but I'm probably misconstruing your words.  I read your article Symbol Tables and Bit-Stuffed Pointers, and I will try to apply it here.  Say these are the two options:

 

A:
LOAD
    key,
    other1
FROM ...;

// Option 1 - KEEP
B:
LEFT KEEP(A)
LOAD
    key,
    other2
FROM ...;

// Option 2 - EXISTS
B:
LOAD
    key,
    other2
FROM ...
WHERE EXISTS(key);

 

I assume that in either case, Qlik must iterate through every row in B at some point in time, and during each of those iterations, it must check whether the current row's key value exists in that same field in A.  That first, outer operation is O(n), where n is the number of rows in B.  The second, inner operation is O(1).  Or, to be less formal but more specific, if a linear search of the symbol table is involved, the inner operation is O(k), where k is the number of rows in A's symbol table.  Correct?

So why does keep outperform exists?  Is it because in that inner operation, exists checks equality based on the real, looked-up data values, whereas keep checks equality based on the bit symbols themselves and skips the look-ups?  If that is the case, could exists be optimized so that in simple cases like this, it looks at the bit symbols instead of the original values they represent, just as keep does?

Or do I have it wrong entirely?  Maybe there are other data structures and algorithms in play here that facilitate faster join-like operations like this?