3 Replies Latest reply: Aug 3, 2010 10:23 AM by Jonathan Dienst RSS

    Incremental load: Using parameters from a QVD in a SQL where clause

      Hi,

      I am having problems with an incremental load. My aim is to add only new or modified records to my model, where the record date is greater than the previous record date, and the value of the record has changed. The data volumes are very large and I am trying to cut down on the records returned by the SQL select, so the problem results from me trying to pass QVD data to a SQL where clause.

      Each record is made of three parts:
      A - Key
      B - Subset key
      C - data value
      DATE - changed for every record associated with Key whenever any data value is changed.

      So if each key has 20 subset keys(attributes of the main key), and once per week one data value is changed, every record with the same key will have their date updated. This will mean that if I insert records into my model based on date, we will get a lot of duplicate values. So I only want to keep a record if the data is different (a straight 'insert' incremental load), hence the use of a composite key containing A,B and C as the unique identifier below.

      essentially:

      first load (only ever to be run once, on first load of the model, to store all values from SQL db)

      OLDDATA:
      SELECT TO_CHAR(A||B||C) AS COMBOKEY, A, B, C, DATE
      FROM TABLE1;

      STORE OLDDATA INTO OLD.QVD;

      Second Load (and successive - the incremental load)

      OLDDATA:
      LOAD * AS LATESTDATE FROM OLD.QVD;

      LOAD MAX(DATE) AS LATESTDATE FROM OLD.QVD;
      LET vLATESTDATE = LATESTDATE;
      //not strictly correct syntax but to cut a long story short - to store the latest date to allow the record load to be cut down to only new records

      NEW:
      SELECT TO_CHAR(A||B||C) AS COMBOKEY, A, B, C, DATE
      FROM TABLE1
      WHERE DATE < $(vLATESTDATE)
      AND NOT EXISTS (COMBOKEY FROM OLD)

      the last line causes the error as this is incorrect SQL - I am unable to pass the value of 'COMBOKEY' from the in-memory table to SQL. I do not want to have to load all the records in the SQL select and do a preceding load with a where clause:

      LOAD * WHERE NOT EXISTS (COMBOKEY);

      As this cripples performance, resulting in all records being pulled from the SQL Database.

      Is there any way around this? Am I able to pass values from a qvd or in memory table back to my SQL command? Is there another way...?

      Kind Regards

      Tom

        • Incremental load: Using parameters from a QVD in a SQL where clause
          Jonathan Dienst

          Tom

          What I do in this sort of case is the following

          • read the QVD to get the latest date in the QVD. (Much as you have done)
          • Copied the date into a variable (Peek())
          • Read the database using the date as a filter. This includes constructing the COMBOKEY
          • THEN concatenate load from the QVD using a LOAD * FROM myQVD (qvd) WHERE NOT EXISTS (COMBOKEY);
          • Store the resulting table back into the QVD

          I think this approach willl work for you as there will be no duplicate COMBOKEYs and the database records will replace the ones that were in the QVD earlier.

          Jonathan

            • Incremental load: Using parameters from a QVD in a SQL where clause

              Thanks Jonathan,

              Is there no way to avoid the full database load? The problem is that filtering by the date only reduces my dataset by about 30%, and the 'where not exists' reduces by as much as 90%.

              Tom

                • Incremental load: Using parameters from a QVD in a SQL where clause
                  Jonathan Dienst

                  These two options produce very different results in your QVD file. Lets look at this first before we consider the number of records to be processed.

                  1. Filtering by "where not exists" in the database would mean that the data that has already loaded into QV would not be updated with the new date. This is the equivalent to an "Add only" approach.
                  2. Filtering by date as described means that the records with new dates will get updated in QV as well - an "Add and Update" approach.

                   

                  If option 1 is what you need for your solution, then you are going to have to think outside the box. If you have control over the database, and the necessary SQL skills, you could do osomething along these lines:

                  • add a bit field to the table
                  • and a script a stored procedure that returns the data for QV.
                  • The script or procedure would select the records where the bit field = 0, and update the bit field to 1.
                  • QV reads the records returned by the script/procedure

                   

                  If option 2 is what you require - then you have the solution - but you will have read more records.

                  But before you go that far - how many records are we talking about and how frequently do you need to read them? A i million record read does not take very long and may not tax you DB server overmuch (or maybe it will?)

                  Jonathan