Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
datanibbler
Champion
Champion

EXISTS()


Hi,

I have been here some times before with similar issues and I remember that this function is in all cases not as easy as the integrated help_file suggests. I remember for sure that it cannot be written with just one parameter as the help suggests. I think it only works if one or b oth of the key_fields in the tables are unique.

=> Well, that >or< is my issue. The scenario is this:

- In table_A (which I load earlier) that key_field (which I want to base the EXISTS() clause on) is unique.

<=> In table_B (which I load afterwards) I have a finer grain and that same key is not unique anymore.

=> Will that work if I have, in the LOAD for table_B, a clause >> WHERE EXISTS([keyfield_from_table_A], [keyfield_from_table_B]) <<?

Thanks a lot!

Best regards,

DataNibbler

27 Replies
datanibbler
Champion
Champion
Author

Okay.

the keyfield is calculated (or concatenated) from several elements, so from table_A (earlier), I can already name it whereas the one from table_B (second parameter to the EXISTS()) I have to write the formula as I cannot yet use the Alias_name. So they are different in a way.

Anyway, I will try loading the keyfield from table_A a second time with a different name and see what nr. of records I get then.

And I will try the alternative, using LEFT KEEP - but that might also prove troublesome because there is no common keyfield that is unique in both tables ...

Best regards,

DataNibbler

hic
Former Employee
Former Employee

No it is too hot in here... You are of course right.

HIC

datanibbler
Champion
Champion
Author

Hi Henric,

I think this "many think" in your post means just what I have thought and others have stated - that EXISTS() (or NOT EXISTS()) does not work as expected when the keyfield is not unique in the current table (table_B, the one loaded later where that clause is applied).

The integrated help does not state (and I think many do not know) that EXISTS() checks the current LOAD also - as one of those that don't have your background_knowledge, that would match my definition of "does not work as is to be expected" then.

Well, let's see what I can make of this in practice.

Best regards,

DataNibbler

datanibbler
Champion
Champion
Author

Hi both of you,

look forward to the weekend 😉 It's supposed to be 37° ...

swuehl
MVP
MVP

Up to 40°C forecasted

marcus_sommer

I think it worked as designed but I agree that the documentation to exists() didn't covered the complexity and confused the user and to learn it is a bit painful.

- Marcus

hic
Former Employee
Former Employee

I think the documentation is clear on that it is previously loaded records and not previously loaded tables:

"These functions are used when a value from previously loaded records of data is needed for the evaluation of the current record."

But I am aware that there are misunderstandings in this area.

HIC

datanibbler
Champion
Champion
Author

Yep. I cannot argue further 😉

I have tried two ways now:

- I have to put together my table_B (the one loaded later) from three elements, thus I have three LOADs.

- At first I had an EXISTS() clause in every single one

- Now I have tried loading them all first and then using the EXISTS() clause - the nr. of records is exactly the same.
  From all I know, there should be the same nr. of records in both tables, with all the right filters applied, but I end up
  with fewer records in table_B - about 10.000 less, so that might be the duplicate thing (the field is not unique in
   table_B as that has a finer grain than table_A, so I cannot create a key that would be both common to those two
   tables AND unique in table_B ...

Now I will try the LEFT KEEP() option.

P.S.: I have also renamed the field in table_A so that my clause read like

>> ... WHERE EXISTS([key_v2], [key])  <<

datanibbler
Champion
Champion
Author

Hmmm ... the LEFT KEEP (with the table_name specified in brackets) produces the exact same nr. of records than the two options before - so I guess that must be correct, even if I don't wholly understand why there are fewer records than in my table_A ... but in companies such as this one, one doesn't have to understand everything ...

Many thanks to all!

Best regards,

DataNibbler

P.S.: OK, this is halfway plausible - it could be that the data in different tables is moved into an archive in different intervals. I will limit my time_filter on table_A to the last three months and see. If that assumption is correct, then the difference in the nr. of records should decrease.

datanibbler
Champion
Champion
Author

This is most strange ...

I have now done all the loading without the final LEFT KEEP (or EXISTS()) and written down four or five fields that I have found (using a straight_table) to not be unique in my table_B.

<=> Then I have activated that LEFT KEEP again - and now I can find NONE of those obviously duplicated values! How can that be?

I understand that the effect of using EXISTS() can be to return too few records because of all the duplicate values, only the first one is kept - but how can it remove ALL of the duplicated values (incl. the original)??

Can you shed some light on this?

Thanks a lot!

Best regards,

DataNibbler