5 Replies Latest reply: Jul 28, 2011 1:05 PM by Donald Posthuma RSS

    Performance enhancement using QVD incremental load

    Donald Posthuma

      The advantage of QVD is obviously the speed at which data can be loaded into the QVW. In the script I'm building data is loaded overnight and only data elements which are new need to be loaded from the SQL database in the script and added to the QVD. I thought this would lead to a considerable performance enhancement however I still need to check wether the field info is new or old. Right now I'm using an IF statement based on the value of a Timestamp. Do you have any idea how to make a construction to speed up this process as checking all elements and combining the new ones to the QVD doesn't seem faster then loading all elements from the SQL database. Thanks!

        • Performance enhancement using QVD incremental load
          Daniel Rozental

          Post your script so we can look into improving it.

          • Performance enhancement using QVD incremental load

            I think using WHERE NOT exists is a lot faster than if statements or distinct.  For example if you have the following fields:

             

            customer_number,

            contract_id,

            customer_number & '|' & contract_id as 'key',

            start_date,

            end_date,

            timestamp

             

            You could use something like:

            WHERE NOT exists(key,customer_number & '|' & contract_id)

             

            The point being that if whatever "key" you use is already loaded, don't load it, otherwise load it.  Here's an example I used before:

             

            Table1:

            customer_number,

            status,

            status_update_date,

            customer_number & '|' & status & '|' & status_update_date as 'key',

            other_stuff,

            other_stuff2

            FROM wherever

            WHERE NOT exists(key, customer_number & '|' & status & '|' & status_update_date);

             

            I had a file for each day and I only wanted to load rows where the combination of customer number, status, and status update date had never been loaded.  When I used LOAD DISTINCT, it took over 5 hours but when I used WHERE NOT eixsts( it took like an hour and a half.  Hopefully that helps.

            • Performance enhancement using QVD incremental load

              Hi there, if your only criteria for loading new information is timestamp, you may try a similar code to the following:

               

              let vToday = today() - 1;

               

              NewRows:

              Load A, B, C, DateTimeField;

              select A, B, C DateTimeField

              from Table

              where DateTimefield >= $(vToday);

               

              OldRows:

              Load *

              from respository.qvd (qvd)

              where not exists(DateTimeField);

               

              Notice that several things are happening here:

               

              In the first load we are loading yesterday's and today's information, so only fresh data is being pulled from the database. Notice that you can easily store and load your last successful reload date if the today() -1 approach does not work for you. It is also necessary that you know the exact way in which your date time field in the database is formatted, so you can format the today´s date string propperly in QLikView. For example:

               

              Load *;

              sql select *

              from Table

              where  DateField >= date '2011-07-27';

               

              In the second load, where are loading the historical data storage in qvd, but only does rows that had not been loaded by the first load statement.

               

              Regards

              • Performance enhancement using QVD incremental load
                Donald Posthuma

                First of all thanks for your help!

                 

                @Daniel: I'm still figuring out how i'm gonna script it so I don't have more then small pieces

                @Trent&Ivan: I understand the performance improvement for certain cases but it still requires going trough all records. It doesn't actually limit the number of records you load to determine wether a record is new/old/existing

                 

                Right now I'm looking into the SQL select to limit the amount of data which is actually pulled into the script. In the normal configuration I would have something like below where a record is loaded depending on a certain treshold.

                 

                LOAD BookInventoryID

                    WHERE BookInventoryID < 50000;

                SQL SELECT BookInventoryID

                FROM DPAR.dbo.BookInventory

                 

                What I'm looking for is a limitation of the data set in the SQL select. If I order my records like shown below. I want to go trough the records until the BookInventoryID reaches the treshold and then terminate the load. However I don't no if it is possible to incorporate the while statement (It's not working this way ) Any thoughts?

                 

                LOAD BookInventoryID;

                 

                SQL SELECT BookInventoryID

                WHILE BookInventoryID < 50000

                FROM DPAR.dbo.BookInventory

                ORDER BY BookInventoryID ASC;