11 Replies Latest reply: May 11, 2015 3:15 AM by Friedrich Hofmann RSS

    Problem with WHERE EXISTS() - simple? All the more annoying ...

    Friedrich Hofmann

      Hi,

       

      I am trying to do something very simple: Load an existing qvd (which I saved yesterday) and then concatenate today's data and store it away again.

      I do that using a WHERE NOT EXISTS() clause on the field with the date.

       

      The issue is, since both the date_field in my history_table and the date_field in today's table actually have the same name, I could write the EXISTS function with just one instead of two parameters, right?

      <=> Well, that didn't seem to properly work. It never does with me. Strange. I actually prefer the two_parameter_method anyway, it is
              clearer and easier to understand for others.

      => So I use a RENAME FIELD statement, after loading the existing history_qvd, to give the date_field there a different name.

      => Then I want to concatenate today's data, so I have to give that same name as an alias to the date_field so that it will be properly
           concatenated, right?

          <=> In the WHERE clause, however, I can (must) still use the original field_name - thus I have two different field_names for my
                  WHERE NOT EXISTS clause.

      <=> Now I'm getting the error message "Field not found <[original fieldname]>"

       

      Can anybody help me there?

      Thanks a lot!

      Best regards,

       

      DataNibbler

       

      P.S.: I am just running the script up to the point where I have both tables, just before I concatenate the records from the one to the other, so I'll see what is the issue - I hope ...

        • Re: Problem with WHERE EXISTS() - simple? All the more annoying ...
          Friedrich Hofmann

          Hmm... strange.

          When I stop at that point, of course I get a massive synthetic key, but there are two different dates in the date_fields, so the WHERE NOT EXISTS() clause should actually lead to the records actually being concatenated.

          Let's see ...

          • Re: Problem with WHERE EXISTS() - simple? All the more annoying ...
            Colin Albert

            You need to use two parameters, as the date you are testing does not exist when the first record is added to the data set, but when subsequent records for the same date occur in the new data, they do not fulfil the "not exists" condition so are not added, as the first record added the missing date.

             

            Using two parameters sets which field is being tested in the existing data, and which field is being tested in the new data.

             

             

            If you use one parameter in the where exists clause, only the first record for the "where not exists" is added.

             

            If you use two parameters in the where exists clause, all  record for the "where not exists" are added.

             

            Hope this helps.

              • Re: Problem with WHERE EXISTS() - simple? All the more annoying ...
                Friedrich Hofmann


                Hi Colin,

                 

                well, that clarifies - I think somebody told me so before, but I had forgotten - but unfortunately my problem persists.

                I will post that particular piece of my code. Maybe you can spot the error.

                 

                 

                LOAD
                    
                RtM_Hist.Datum,
                    
                Instanz__RtM,
                    
                Dokument_RtM,
                    
                Ablage_vs_email_RtM,
                    
                Ausgabedateityp_RtM,
                    
                Ausgabepfad_RtM,
                    
                Ausgabedateiname_RtM,
                    
                Task_gelaufen_RtM,
                    
                Asgabedateiname_RtM
                FROM
                [server]\01_QVD\QVD\RtM_Historie.qvd
                (
                qvd);

                Concatenate (RtM_Hist_bisher)
                LOAD
                    
                Datum as RtM_Hist.Datum,
                    
                Instanz_RtM,
                    
                Dokument_RtM,
                    
                Ablage_vs_email_RtM,
                    
                Ausgabedateityp_RtM,
                    
                Ausgabepfad_RtM,
                    
                Ausgabedateiname_RtM,
                    
                Task_gelaufen_RtM
                RESIDENT RtM_Historie
                WHERE NOT EXISTS (RtM_Hist.Datum, Datum)
                ;
                DROP TABLE RtM_Historie;
                STORE RtM_Hist_bisher INTO [server]\01_QVD\QVD\RtM_Historie.qvd;

                 

                The code is actually simple, that's why I cannot imagine the error.

                The message I am getting is "Field not found <Datum>" (on the second LOAD, the one to concatenate).

                  • Re: Problem with WHERE EXISTS() - simple? All the more annoying ...
                    Colin Albert

                    Try this thread for a more detailed explanation.  The help on "Exists" is not great!

                     

                    The exists issue

                      • Re: Problem with WHERE EXISTS() - simple? All the more annoying ...
                        Friedrich Hofmann

                        Hi Colin,

                         

                        okay. So EXISTS() has to be used with caution as the date is the same in all 16 records of my history_qvd.

                        I would expect another 16 records - the nr. of tasks that are checked daily - to be appended, but it might be less because of this.

                        Currently, however, I am not getting any result, I am getting an error message which I don't understand.

                        Can you help me with that?

                        Then I'll see if it works or not and can think of a workaround if it doesn't.

                      • Re: Problem with WHERE EXISTS() - simple? All the more annoying ...
                        Michael Solomovich

                        I think you're looking into a wrong place.  From this portion of the script, it's hard to tell what's going on.  Here is what I see:

                        1. The first load here doesn't have explicit table name
                        LOAD
                             RtM_Hist.Datum,
                             ...
                        FROM [server]\01_QVD\QVD\RtM_Historie.qvd (qvd);

                        In this case, the script will assgn this table a name of the data source - RtM_Historie

                        2. The second load
                        Concatenate (RtM_Hist_bisher)
                        LOAD
                             Datum as RtM_Hist.Datum,
                             ...
                        RESIDENT RtM_Historie
                        WHERE NOT EXISTS (RtM_Hist.Datum, Datum)

                        It concatenates to a table RtM_Hist_bisher - we don't see this table here.  It possibly exist or maybe not, and we don't know its structure.
                        Next, you read here from a table RtM_Historie that you just created in the previus step. It doesn't have field Datum, hence the error.  Besides, I don't understand why are you doing this.

                        Check again and again - what tables are you reading data from.  Not much to do with the exists() function.

                          • Re: Problem with WHERE EXISTS() - simple? All the more annoying ...
                            Friedrich Hofmann

                            Hi Michael,

                             

                            sorry about that - the table_name is there all right. That's just because, every time I paste anything into a post here in the Community, the first word is cut off.

                            The name of the table is >> RtM_Hist_bisher <<.

                            I do it a bit differently now to avoid that EXISTS() clause altogether. The code now looks like this (I am adding the comments just now, they are not in my script)

                             

                            RtM_Historie:

                            FIRST ($(v_rows_RtM)-1) LOAD // The point of this is to get rid of the very 1st record which is blank (remainder from a Dummy)
                               
                            Date(Today(), 'DD.MM.YYYY') as Datum,
                               
                            Instanz as Instanz_RtM,
                               
                            Dokument as Dokument_RtM,
                               
                            Outputtyp as Ablage_vs_email_RtM,
                               
                            Typ as Ausgabedateityp_RtM,
                               
                            Ziel as Ausgabepfad_RtM,
                               
                            Zielname as Asgabedateiname_RtM,
                               
                            gelaufen_jn as Task_gelaufen_RtM
                            RESIDENT RtM_gesamt
                            ORDER BY Instanz, Dokument
                            ;

                            DROP TABLE RtM_gesamt;

                            RtM_Hist_bisher:
                            LOAD
                                
                            Datum as RtM_Hist_Datum,
                                
                            Instanz_RtM,
                                
                            Dokument_RtM,
                                
                            Ablage_vs_email_RtM,
                                
                            Ausgabedateityp_RtM,
                                
                            Ausgabepfad_RtM,
                                
                            Asgabedateiname_RtM,
                                
                            Task_gelaufen_RtM
                            FROM
                            [server]\01_QVD\QVD\RtM_Historie.qvd
                            (
                            qvd);

                            // From this table, I now extract only the dates that are already contained in there.

                            Aux_Hist:
                            LOAD DISTINCT
                               
                            RtM_Hist_Datum
                            RESIDENT RtM_Hist_bisher;
                            // Today's date must be the very last record in this table if it is there, so we need to check only that one.
                            SET v_day_counter = 0;
                            IF DATE(Peek('RtM_Hist_Datum', (NoOfRows('Aux_Hist')-1), 'Aux_Hist'), 'DD.MM.YY') = Date(Today(), 'DD.MM.YY') THEN
                             
                            SET v_day_counter = 1;

                            ENDIF

                            DROP TABLE Aux_Hist;
                            // If I have now identified that today's date is not yet contained in the history_table, today's data can just be appended without any further checks.
                            IF $(v_day_counter) = 0 THEN
                            Concatenate (RtM_Hist_bisher)
                            LOAD
                                
                            Datum as RtM_Hist.Datum,
                                
                            Instanz_RtM,
                                
                            Dokument_RtM,
                                
                            Ablage_vs_email_RtM,
                                
                            Ausgabedateityp_RtM,
                                
                            Ausgabepfad_RtM,
                                
                            Ausgabedateiname_RtM,
                                
                            Task_gelaufen_RtM
                            RESIDENT RtM_Historie

                            ;
                            ENDIF

                            DROP TABLE RtM_Historie;
                            STORE RtM_Hist_bisher INTO [server]\01_QVD\QVD\RtM_Historie.qvd;

                             

                            P.S.: That way seems to work. I only have a few too many records where a lot of the fields seem to have NULL values or just be blank. I will look at how to get rid of those tomorrow.

                          • Re: Problem with WHERE EXISTS() - simple? All the more annoying ...
                            Evan Kurowski

                            DataNibbler wrote:

                             


                            Hi Colin,

                             

                            well, that clarifies - I think somebody told me so before, but I had forgotten - but unfortunately my problem persists.

                            I will post that particular piece of my code. Maybe you can spot the error.

                             

                             

                            LOAD
                                
                            RtM_Hist.Datum,
                                
                            Instanz__RtM,
                                
                            Dokument_RtM,
                                
                            Ablage_vs_email_RtM,
                                
                            Ausgabedateityp_RtM,
                                
                            Ausgabepfad_RtM,
                                
                            Ausgabedateiname_RtM,
                                
                            Task_gelaufen_RtM,
                                
                            Asgabedateiname_RtM
                            FROM
                            [server]\01_QVD\QVD\RtM_Historie.qvd
                            (
                            qvd);

                            Concatenate (RtM_Hist_bisher)
                            LOAD
                                
                            Datum as RtM_Hist.Datum,
                                
                            Instanz_RtM,
                                
                            Dokument_RtM,
                                
                            Ablage_vs_email_RtM,
                                
                            Ausgabedateityp_RtM,
                                
                            Ausgabepfad_RtM,
                                
                            Ausgabedateiname_RtM,
                                
                            Task_gelaufen_RtM
                            RESIDENT RtM_Historie
                            WHERE NOT EXISTS (RtM_Hist.Datum, Datum)
                            ;
                            DROP TABLE RtM_Historie;
                            STORE RtM_Hist_bisher INTO [server]\01_QVD\QVD\RtM_Historie.qvd;

                             

                            The code is actually simple, that's why I cannot imagine the error.

                            The message I am getting is "Field not found <Datum>" (on the second LOAD, the one to concatenate).

                            The error message looks correct.   Datum isn't a field present in RtM_Historie.

                            LOAD
                                
                            Datum as RtM_Hist.Datum,  //<~ this field definition is the cause

                            RESIDENT RtM_Historie

                            • Re: Problem with WHERE EXISTS() - simple? All the more annoying ...
                              Ignacio Cifuentes

                              Have you tried insted of "WHERE NOT EXISTS (....)" WHERE LEN(Datum)=0;

                          • Re: Problem with WHERE EXISTS() - simple? All the more annoying ...
                            Evan Kurowski

                            But this is being mentioned elsewhere in this post.  The fundamental difference in the sequencing between EXISTS and NOT EXISTS has driven me bats a few times.

                             

                            If you're attempting to filter a large QVD, obviously the goal is to preserve optimized load, which restricts the number of fields allowed in the EXISTS clause to a single field reference (whether this is one field argument, or two arguments using the same field, the result is still optimized)

                             

                            However, when filtering on a single field

                             

                            EXISTS, the sequence is:

                            Take all the values from the filter field

                            =

                            Reduce the target data set where there is a match

                             

                            NOT EXISTS, the sequence is:

                            Take all the values from the filter field + combine them with a list of values from the target source

                            =

                            THEN apply this combined filter to reduce the target set (which means nothing gets filtered out, because the reduction list now contains every value in the target)

                            To be consistent with EXISTS, what I expected to work on NOT EXISTS was you could define a filter and reduce based on non-matches:

                             

                            [FILTER]:
                            LOAD * INLINE [
                            ID
                            A
                            ]
                            ;

                            [FACT]: 
                            LOAD *
                            FROM DATA.QVD (qvd//<~ this QVD has a field called ID
                            WHERE NOT EXISTS(ID);
                            //the desired result would have been an optimized load where the result set omits rows where ID = 'A' , but this doesn't occur

                            //the load is optimized, but nothing gets filtered out because of the sequencing explained above

                             

                             

                             

                            The workaround is to alias the filter field, so that the filter does not get comingled with the target before reduction.  However this switches the NOT EXISTS to two-argument syntax, which then deactivates optimized load.

                             

                            Meaning it is very easy to extract a set of rows from a giant QVD where you want a positive match, but not quite as straightforward when you want to strip them out.

                             

                            [FILTER]:
                            LOAD * INLINE [
                            ALIASED_ID
                            A
                            ]
                            ;

                            [FACT]: 
                            LOAD *
                            FROM DATA.QVD (qvd//<~ this QVD has a field called ID
                            WHERE NOT EXISTS(ALIASED_ID, ID);

                            //this will strip out the ID = 'A' rows, but non-optimized, meaning if the QVD is in the GIGA size range, you are talking minutes instead of seconds

                              • Re: Problem with WHERE EXISTS() - simple? All the more annoying ...
                                Friedrich Hofmann

                                Hi all,

                                 

                                thanks for the helpful answers!

                                It seems both EXISTS() and NOT EXISTS() can be tricky and writing that function with just one parameter somehow seems to always go wrong. One has to do some renaming so that this function can be written with two parameters. I devised a workaround for this one and that seems to work fine since in this case I just want to append every day's data, so I only have to check whether the present day is already in the history_table or now.

                                By extracting in a RESIDENT a small list with only the dates already present in that table, I can quickly decide whether the present day is there or not and I don't need any further checks.

                                This is solved then.

                                 

                                Best regards,

                                 

                                DataNibbler