Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
datanibbler
Champion
Champion

Problem with WHERE EXISTS() - simple? All the more annoying ...

Hi,

I am trying to do something very simple: Load an existing qvd (which I saved yesterday) and then concatenate today's data and store it away again.

I do that using a WHERE NOT EXISTS() clause on the field with the date.

The issue is, since both the date_field in my history_table and the date_field in today's table actually have the same name, I could write the EXISTS function with just one instead of two parameters, right?

<=> Well, that didn't seem to properly work. It never does with me. Strange. I actually prefer the two_parameter_method anyway, it is
        clearer and easier to understand for others.

=> So I use a RENAME FIELD statement, after loading the existing history_qvd, to give the date_field there a different name.

=> Then I want to concatenate today's data, so I have to give that same name as an alias to the date_field so that it will be properly
     concatenated, right?

    <=> In the WHERE clause, however, I can (must) still use the original field_name - thus I have two different field_names for my
            WHERE NOT EXISTS clause.

<=> Now I'm getting the error message "Field not found <[original fieldname]>"

Can anybody help me there?

Thanks a lot!

Best regards,

DataNibbler

P.S.: I am just running the script up to the point where I have both tables, just before I concatenate the records from the one to the other, so I'll see what is the issue - I hope ...

1 Solution

Accepted Solutions
datanibbler
Champion
Champion
Author

Hi all,

thanks for the helpful answers!

It seems both EXISTS() and NOT EXISTS() can be tricky and writing that function with just one parameter somehow seems to always go wrong. One has to do some renaming so that this function can be written with two parameters. I devised a workaround for this one and that seems to work fine since in this case I just want to append every day's data, so I only have to check whether the present day is already in the history_table or now.

By extracting in a RESIDENT a small list with only the dates already present in that table, I can quickly decide whether the present day is there or not and I don't need any further checks.

This is solved then.

Best regards,

DataNibbler

View solution in original post

11 Replies
datanibbler
Champion
Champion
Author

Hmm... strange.

When I stop at that point, of course I get a massive synthetic key, but there are two different dates in the date_fields, so the WHERE NOT EXISTS() clause should actually lead to the records actually being concatenated.

Let's see ...

Colin-Albert

You need to use two parameters, as the date you are testing does not exist when the first record is added to the data set, but when subsequent records for the same date occur in the new data, they do not fulfil the "not exists" condition so are not added, as the first record added the missing date.

Using two parameters sets which field is being tested in the existing data, and which field is being tested in the new data.

If you use one parameter in the where exists clause, only the first record for the "where not exists" is added.

If you use two parameters in the where exists clause, all  record for the "where not exists" are added.

Hope this helps.

datanibbler
Champion
Champion
Author


Hi Colin,

well, that clarifies - I think somebody told me so before, but I had forgotten - but unfortunately my problem persists.

I will post that particular piece of my code. Maybe you can spot the error.

LOAD
    
RtM_Hist.Datum,
    
Instanz__RtM,
    
Dokument_RtM,
    
Ablage_vs_email_RtM,
    
Ausgabedateityp_RtM,
    
Ausgabepfad_RtM,
    
Ausgabedateiname_RtM,
    
Task_gelaufen_RtM,
    
Asgabedateiname_RtM
FROM
[server]\01_QVD\QVD\RtM_Historie.qvd
(
qvd);

Concatenate (RtM_Hist_bisher)
LOAD
    
Datum as RtM_Hist.Datum,
    
Instanz_RtM,
    
Dokument_RtM,
    
Ablage_vs_email_RtM,
    
Ausgabedateityp_RtM,
    
Ausgabepfad_RtM,
    
Ausgabedateiname_RtM,
    
Task_gelaufen_RtM
RESIDENT RtM_Historie
WHERE NOT EXISTS (RtM_Hist.Datum, Datum)
;
DROP TABLE RtM_Historie;
STORE RtM_Hist_bisher INTO [server]\01_QVD\QVD\RtM_Historie.qvd;

The code is actually simple, that's why I cannot imagine the error.

The message I am getting is "Field not found <Datum>" (on the second LOAD, the one to concatenate).

Colin-Albert

Try this thread for a more detailed explanation.  The help on "Exists" is not great!

The exists issue

datanibbler
Champion
Champion
Author

Hi Colin,

okay. So EXISTS() has to be used with caution as the date is the same in all 16 records of my history_qvd.

I would expect another 16 records - the nr. of tasks that are checked daily - to be appended, but it might be less because of this.

Currently, however, I am not getting any result, I am getting an error message which I don't understand.

Can you help me with that?

Then I'll see if it works or not and can think of a workaround if it doesn't.

Anonymous
Not applicable

I think you're looking into a wrong place.  From this portion of the script, it's hard to tell what's going on.  Here is what I see:

1. The first load here doesn't have explicit table name
LOAD
     RtM_Hist.Datum,
     ...
FROM [server]\01_QVD\QVD\RtM_Historie.qvd (qvd);

In this case, the script will assgn this table a name of the data source - RtM_Historie

2. The second load
Concatenate (RtM_Hist_bisher)
LOAD
     Datum as RtM_Hist.Datum,
     ...
RESIDENT RtM_Historie
WHERE NOT EXISTS (RtM_Hist.Datum, Datum)

It concatenates to a table RtM_Hist_bisher - we don't see this table here.  It possibly exist or maybe not, and we don't know its structure.
Next, you read here from a table RtM_Historie that you just created in the previus step. It doesn't have field Datum, hence the error.  Besides, I don't understand why are you doing this.

Check again and again - what tables are you reading data from.  Not much to do with the exists() function.

datanibbler
Champion
Champion
Author

Hi Michael,

sorry about that - the table_name is there all right. That's just because, every time I paste anything into a post here in the Community, the first word is cut off.

The name of the table is >> RtM_Hist_bisher <<.

I do it a bit differently now to avoid that EXISTS() clause altogether. The code now looks like this (I am adding the comments just now, they are not in my script)

RtM_Historie:

FIRST ($(v_rows_RtM)-1) LOAD // The point of this is to get rid of the very 1st record which is blank (remainder from a Dummy)
   
Date(Today(), 'DD.MM.YYYY') as Datum,
   
Instanz as Instanz_RtM,
   
Dokument as Dokument_RtM,
   
Outputtyp as Ablage_vs_email_RtM,
   
Typ as Ausgabedateityp_RtM,
   
Ziel as Ausgabepfad_RtM,
   
Zielname as Asgabedateiname_RtM,
   
gelaufen_jn as Task_gelaufen_RtM
RESIDENT RtM_gesamt
ORDER BY Instanz, Dokument
;

DROP TABLE RtM_gesamt;

RtM_Hist_bisher:
LOAD
    
Datum as RtM_Hist_Datum,
    
Instanz_RtM,
    
Dokument_RtM,
    
Ablage_vs_email_RtM,
    
Ausgabedateityp_RtM,
    
Ausgabepfad_RtM,
    
Asgabedateiname_RtM,
    
Task_gelaufen_RtM
FROM
[server]\01_QVD\QVD\RtM_Historie.qvd
(
qvd);

// From this table, I now extract only the dates that are already contained in there.

Aux_Hist:
LOAD DISTINCT
   
RtM_Hist_Datum
RESIDENT RtM_Hist_bisher;
// Today's date must be the very last record in this table if it is there, so we need to check only that one.
SET v_day_counter = 0;
IF DATE(Peek('RtM_Hist_Datum', (NoOfRows('Aux_Hist')-1), 'Aux_Hist'), 'DD.MM.YY') = Date(Today(), 'DD.MM.YY') THEN
 
SET v_day_counter = 1;

ENDIF

DROP TABLE Aux_Hist;
// If I have now identified that today's date is not yet contained in the history_table, today's data can just be appended without any further checks.
IF $(v_day_counter) = 0 THEN
Concatenate (RtM_Hist_bisher)
LOAD
    
Datum as RtM_Hist.Datum,
    
Instanz_RtM,
    
Dokument_RtM,
    
Ablage_vs_email_RtM,
    
Ausgabedateityp_RtM,
    
Ausgabepfad_RtM,
    
Ausgabedateiname_RtM,
    
Task_gelaufen_RtM
RESIDENT RtM_Historie

;
ENDIF

DROP TABLE RtM_Historie;
STORE RtM_Hist_bisher INTO [server]\01_QVD\QVD\RtM_Historie.qvd;

P.S.: That way seems to work. I only have a few too many records where a lot of the fields seem to have NULL values or just be blank. I will look at how to get rid of those tomorrow.

evan_kurowski
Specialist
Specialist

DataNibbler wrote:


Hi Colin,

well, that clarifies - I think somebody told me so before, but I had forgotten - but unfortunately my problem persists.

I will post that particular piece of my code. Maybe you can spot the error.

LOAD
    
RtM_Hist.Datum,
    
Instanz__RtM,
    
Dokument_RtM,
    
Ablage_vs_email_RtM,
    
Ausgabedateityp_RtM,
    
Ausgabepfad_RtM,
    
Ausgabedateiname_RtM,
    
Task_gelaufen_RtM,
    
Asgabedateiname_RtM
FROM
[server]\01_QVD\QVD\RtM_Historie.qvd
(
qvd);

Concatenate (RtM_Hist_bisher)
LOAD
    
Datum as RtM_Hist.Datum,
    
Instanz_RtM,
    
Dokument_RtM,
    
Ablage_vs_email_RtM,
    
Ausgabedateityp_RtM,
    
Ausgabepfad_RtM,
    
Ausgabedateiname_RtM,
    
Task_gelaufen_RtM
RESIDENT RtM_Historie
WHERE NOT EXISTS (RtM_Hist.Datum, Datum)
;
DROP TABLE RtM_Historie;
STORE RtM_Hist_bisher INTO [server]\01_QVD\QVD\RtM_Historie.qvd;

The code is actually simple, that's why I cannot imagine the error.

The message I am getting is "Field not found <Datum>" (on the second LOAD, the one to concatenate).

The error message looks correct.   Datum isn't a field present in RtM_Historie.

LOAD
    
Datum as RtM_Hist.Datum,  //<~ this field definition is the cause

RESIDENT RtM_Historie

evan_kurowski
Specialist
Specialist

But this is being mentioned elsewhere in this post.  The fundamental difference in the sequencing between EXISTS and NOT EXISTS has driven me bats a few times.

If you're attempting to filter a large QVD, obviously the goal is to preserve optimized load, which restricts the number of fields allowed in the EXISTS clause to a single field reference (whether this is one field argument, or two arguments using the same field, the result is still optimized)

However, when filtering on a single field

EXISTS, the sequence is:

Take all the values from the filter field

=

Reduce the target data set where there is a match

NOT EXISTS, the sequence is:

Take all the values from the filter field + combine them with a list of values from the target source

=

THEN apply this combined filter to reduce the target set (which means nothing gets filtered out, because the reduction list now contains every value in the target)

To be consistent with EXISTS, what I expected to work on NOT EXISTS was you could define a filter and reduce based on non-matches:

[FILTER]:
LOAD * INLINE [
ID
A
]
;

[FACT]: 
LOAD *
FROM DATA.QVD (qvd//<~ this QVD has a field called ID
WHERE NOT EXISTS(ID);
//the desired result would have been an optimized load where the result set omits rows where ID = 'A' , but this doesn't occur

//the load is optimized, but nothing gets filtered out because of the sequencing explained above

The workaround is to alias the filter field, so that the filter does not get comingled with the target before reduction.  However this switches the NOT EXISTS to two-argument syntax, which then deactivates optimized load.

Meaning it is very easy to extract a set of rows from a giant QVD where you want a positive match, but not quite as straightforward when you want to strip them out.

[FILTER]:
LOAD * INLINE [
ALIASED_ID
A
]
;

[FACT]: 
LOAD *
FROM DATA.QVD (qvd//<~ this QVD has a field called ID
WHERE NOT EXISTS(ALIASED_ID, ID);

//this will strip out the ID = 'A' rows, but non-optimized, meaning if the QVD is in the GIGA size range, you are talking minutes instead of seconds