12 Replies Latest reply: Jul 14, 2015 9:52 AM by Jeremy Schmitt RSS

    Incrimental Load with an Update Date

    Jeremy Schmitt

      We have a bunch of qvds set up with incremental loads at our company. Most of our qvds that load from SQL have a WHERE clause  grabbing any record where the CREATE_DATE is greater or equal to a variable date that is created and stored each reload. It then concatenates any records found with the qvd if a primary key does not exist (I believe this is a normal incremental load).

       

      The issue I am seeing is that some of the SQL tables have an UPDATE_DATE where records can be updated if need be. This is written into some of the qvd loads, but because the primary key from this updated record exists in the qvd already, it does not get added. We are not adding updated records to the qvd, only those with new CREATE_DATE.

       

      Anyone have an incremental script that works with two different dates fields?

       

      Here is the current script (I removed the fields to shorten it):

       

      ////////////////////////////////////////////////////////////////////////////////////////

      MaxDate:
      load * from max_source_record.qvd (qvd);
      let maxdate= text(date(peek('temp1',0,MaxDate),'YYYYMMDD'));
      drop table pointless;

      //////////////////////////////////////////////////////////////////////////////////////////

      SOURCE_RECORD:
      Load
         fields
      ;
      SQL SELECT 
          fields
      FROM dbo.table with (nolock)

      /*comment for full reload*/ where (CONVERT(VARCHAR,CREATE_DATE,112) >= '$(maxdate)') or CONVERT(VARCHAR,CHANGE_DATE,112) >= '$(maxdate)';
      /*comment for full reload*/ CONCATENATE LOAD * FROM SOURCE_RECORD.qvd (qvd) where not exists(PRIMARY_KEY);
      ;
      STORE * FROM SOURCE_RECORD INTO SOURCE_RECORD.qvd (qvd);

      DROP TABLE SOURCE_RECORD;

      //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

      currentmax:
      load *;
      sql select max(case when CHANGE_DATE > CREATE_DATE then CHANGE_DATE else CREATE_DATE end) as temp1
      from dbo.table;
      STORE * FROM currentmax INTO max_source_record.qvd (qvd);
      DROP TABLE currentmax;

        • Re: Incrimental Load with an Update Date
          Bill Markham

          Does each row have a unique key ?

           

          If so then have a look at the section Free Form Data With Modification of Existing Rows of this blog post http://www.quickintelligence.co.uk/qlikview-incremental-load/

           

          The important bit is WHERE NOT EXISTS (IDField)

            • Re: Incrimental Load with an Update Date
              Jeremy Schmitt

              Yes the Primary_key is the unique field for each row, and I agree, the WHERE NOT EXISTS is what is preventing the updated records from being stored because the unique key exists.

               

              I will look at your link, thanks for a starting place, I wasn't sure how to word this question or search for this online.


              I read through and though it was a really good article, it still does not solve my problem. I have a unique ID, this is the PRIMARY_KEY, but that record has a CREATE_DATE and an UPDATE_DATE. I have an example below showing what I would have in the qvd already, and what SQL pulls the next day. You can see I now have a 4th record with a new CREATE_DATE that will concatenate to the qvd, but I also have record 1 that has been updated, but when it goes to concatenate, the PRIMARY_KEY exists and will not be updated. (At least how my script has it.

               

              does anyone know of a workaround in this situation that I could adapt?

               

                 

              In current QVD on 7/13
              PRIMARY_KEYfield changed exampleCREATE_DATEUPDATE_DATE
              ABC1NO7/13/2015null
              ABC2NO7/13/2015null
              ABC3NO7/13/2015null
              SQL pulled 7/14 and needs to concatenate into QVD
              PRIMARY_KEYfield changed exampleCREATE_DATEUPDATE_DATE
              ABC1YES7/13/20157/14/2015
              ABC2NO7/13/2015null
              ABC3NO7/13/2015null
              ABC4NO7/14/2015null
                • Re: Incrimental Load with an Update Date
                  Bill Markham

                  In essence :

                   

                  • Load from database if UPDATE_DATE is after last load and if not then load if CREATE_DATE is after last load otherwise discard

                  • Concatenate onto table from above by loading qvd from last load using  WHERE NOT EXISTS ( [PrimaryKey] )
                    • [You should notice that it is still an optimised load, which is one of the great things about WHERE NOT EXISTS]
                  • Re: Incrimental Load with an Update Date
                    Steve Dark

                    Jeremy,

                     

                    What you have there looks like it should work.  As long as you pull the SQL first and then append the data from the QVD you should always have the latest rows.

                     

                    The only thing I can see is that you reference CHANGE_DATE in your code, but UPDATE_DATE in your example.  Have you got the wrong field?

                     

                    Other observations; I would get the max date from the SQL data before loading the rest of the data.  As it stands it is possible for rows to be written / updated between the start of the main SQL load and the pull of the max date.  These could fall between the cracks.  Putting the get date first could cause the same row to come from SQL twice, but the WHERE EXISTS would stop duplicates finding their way into the final QVD.

                     

                    Also, be careful of data types, in both SQL and QlikView.  In the SQL code you are converting dates to VARCHAR before comparing them.  This may not always pull the write values due to string compare not being the same as a date one - you should instead convert the variable (which will be a string) to a date.  It may be that the format of having year, month then day means that this isn't actually an issue.

                     

                    Hope that helps.

                     

                    Steve

                      • Re: Incrimental Load with an Update Date
                        Jeremy Schmitt

                        I modified the script to read better, so I changed to Update Date.

                         

                        This does not work because the PRIMARY_KEY on the record which was updated already exists in the QVD, so it will not concatenate, therefore it still has the original created record, not the updated. The updated record does not qualify to concatenate to the qvd because the primary key would already exist.

                         

                        as far as your observation of the stored max date going before the SQL load, I agree, I did not write any of these. The next time the QVD is reloaded, it should get those that fell between the crack the day before in the load as those records do not exist and they max date is equal to or greater than the CREATE_DATE.

                          • Re: Incrimental Load with an Update Date
                            Steve Dark

                            Hi Jeremy,

                             

                            The primary key that is already in the QVD should not stop the row being brought in from SQL.  If the query is correct then the updated SQL row should be brought into memory.  The concatenate from the existing QVD should then bring in old rows, but not ones that have been updated (i.e. are already in memory).

                             

                            If the original ABC1 is still there after the merge, then I think it most likely that the SQL statement failed to retrieve the modified record.

                             

                            You can test for this in a couple of different ways.  One would be to put an EXIT SCRIPT statement after the SQL load, and take a look at what is in the table at that point.  You would need to ensure that you don't store the latest date file if you do this (as the new QVD will not have those rows due to the script being aborted).

                             

                            Another way you can test is to write the content of the SOURCE_RECORD to a separate QVD in between the SQL load and the QVD concatenate.  This way you can see what the SQL turned up.

                             

                            Alternatively, you can check your SQL code by issuing it directly against the database using the database IDE.  The best way to grab the code (with the variables inserted) is to go into Debug mode in the load script and step through the code a line at a time.

                             

                            Hope that helps,
                            Steve

                              • Re: Incrimental Load with an Update Date
                                Jeremy Schmitt

                                The primary key that is already in the QVD should not stop the row being brought in from SQL. (yep I realize this)  If the query is correct then the updated SQL row should be brought into memory.(Correct)  The concatenate from the existing QVD should then bring in old rows, but not ones that have been updated (i.e. are already in memory).

                                 

                                If the original ABC1 is still there after the merge, then I think it most likely that the SQL statement failed to retrieve the modified record

                                 

                                ----------------------------------------------------------------------------------------------------------------------------------------------------------

                                 

                                I think maybe I am just misunderstanding what the WHERE NOT EXISTS is doing. So you are telling me the WHERE NOT EXISTS is excluding the record from the qvd during the concatenation because it exist in the SQL pull? As I understood it the records that are new in the SQL pull are concatenated to the existing qvd. So, if the record (because of the primary key) existed in the qvd the record form the SQL pull is not appended into the qvd. (i.e. SQL code pulls a record that has been updated but it exists in the qvd, it is then not concatenated to the qvd because that primary key exists... I could be wrong).

                                  • Re: Incrimental Load with an Update Date
                                    Steve Dark

                                    Hi Jeremy - the WHERE NOT EXISTS constraint is on the load from the QVD, and it is rows from the old QVD that are not loaded.  When the combined QVD is then written the latest version of updated records is kept, and the previous version is effectively overwritten.

                                     

                                    The logic in your original script was largely correct.

                                      • Re: Incrimental Load with an Update Date
                                        Jeremy Schmitt

                                        Thank you Steve for finally clearing this up for me. This is very helpful as I was under the wrong impression. This is the only thing I could be thinking of that would be causing some of the missing records we are experiencing. I cannot determine what records they are, I just know I have to d a full reload every month or two in order to ensure all records are accounted for.

                                          • Re: Incrimental Load with an Update Date
                                            Steve Dark

                                            What I would look at doing is archiving the QVD before you do a full refresh and create a new QVD.  You can then compare the IDs and content of the two QVDs to see exactly which rows were omitted.  This should then point you in the direction of the issue.

                                             

                                            If the missing ones all seem to have been updated around the time of your loads then the getting of the last date is a likely culprit.  Time zones and system clocks being out of sync could be another.

                                             

                                            Depending on how long your load takes and what your refresh rate and window of opportunity for loading are you may want to consider putting in an overlap with the times.  Perhaps always go an hour, or even twenty four hours, back on the date you know you last pulled records from.  Just as the WHERE NOT EXISTS removes records that have been updated it will also remove records that have been pulled more than once because of your overlap in last load date.

                                             

                                            You have the luxury of being able to do this because you have a unique ID.  Where there is no unique ID it is usually a good idea to create one, in order that you can then have this kind of incremental routine.

                                             

                                            Steve