Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Consider the following Load script:
LOAD
key,
first_name
Inline [
key,first_name
1,Princess
2,Dread Pirate
3,Fezzik
];
LOAD
key,
last_name
Inline [
key,last_name
1,Buttercup
2,Roberts
4,Humperdinck
];
Suppose I want to count the total number of people who have either first_name='Princess' or 'Fezzik' or last_name='Humperdinck' based on the current selection.
I have tried to do this using the following set analysis:
Count({
$<first_name*={'Princess','Fezzik'}>
+ $<last_name*={'Humperdinck'}>
} DISTINCT key)
We first find the intersection of the current selection (*=) in first_name with the values {'Princess', 'Fezzik'} and the intersection of the current selection in last_name with the value {'Humperdinck'}. We then find the union (+) of these two sets. Finally, we count the distinct keys.
If there are no other selections, my instinct is that the result should be 3. But in fact, the result is 2.
If I make the selection last_name='Buttercup', 'Roberts', 'Humperdinck', I would expect the count above to yield 2. But instead I get 1.
However, if I go back to the load script and explicitly JOIN the two tables (replace the second LOAD with JOIN LOAD), the set analysis works exactly as I would expect.
Can anyone link me to documentation which explains why the set analysis works differently depending on whether the tables are implicitly joined or explicitly joined?
It has something to do with the nulls.
Scenario 1
Scenario 2
But why do the NULLs interact differently for the implicit join compared to the explicit join?
Is the implicit JOIN functioning more like a LEFT JOIN rather than an OUTER JOIN?
This is what it looks like if I don't include the Count in the table: