Skip to main content
Announcements
Global Transformation Awards submissions are open! SUBMIT YOUR STORY
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Doubt in Rob's Incremental Load Document!

Hi All,

I was trying to do the incremental load with the help of Rob's cookbook. I am having doubts in that. I suppose the basic purpose of doing the incremental load is to load only the Inserted,Updated and Deleted rows. But when I tried his document with the data of mine what I noticed is that thought the data has been loaded completely in the 1st load and very soon after the first load when I tried to load it for the 2nd time without inserting,updating or deleting any rows it is loading all the data. It is not doing what exactly I wanted to do! I mean I don't want to load all the data again if now no rows are inserted,update and deleted. It should load only those records which is inserted,updated and deleted after loading the complete data.

Find the below screen shots for better clarity on my doubt and attachment of Rob's document with my data.

1st Reload:

error loading image

2nd Reload:

error loading image

1 Solution

Accepted Solutions
Not applicable
Author

Hi Rikab,

I will try to explain the process followed in Rob's incremental reload document.

Incremental Setup Tab:

1. The variables are set to be used in the scripts. There is no need to explain the purpose of each variable as it is very clearly explained in the script comment.

2. In the same tab, the 'max date' is obtained if the QVD already exists and has some data. Then this date is used to define the 'Incremental exp' variable which the where class.

Data Load Tab:

This is very straight forward as it loads the data from the source based on the given SQL.

Incremental Update tab:

The update of any changed data occurs in this step. Let us consider that we are loading this for third time and there is already some data in the QVD (60,000 rows). We should check if any data available in the QVD (which is already loaded) is updated, if so, then we have to replace the data in QVD with the new data.

Lets consider that the current run has loaded 800 rows and this should be concatenated to the QVD which already has 60000 rows. We should check if any of the 'PK' in new 800 list is already available in QVD and these updated records should be replaced in QVD. So, in order to do that we are loading the entire data from QVD and concatenating with the new data with the condition "WHERE NOT exists(PK, $(PKexp))". So, this will load only the records from QVD which are not present in the new list. This ensures that all updated records are also loaded and finally stored in the QVD.

So, every time when we run this document, all the rows (60000 from QVD + 800 from DB will be loaded). This is because we have to do an update. Hence, loading all rows means that it loads only the latest record from DB and remaining from QVD. A load from QVD will be always faster when compared to DB load and hence this incremental load will be much faster than full reload.

We cannot skip loading all rows from existing QVD, as the update cannot happen without this. Hope this clears your doubt of loading all the records again.

Regards,

Haneesh

View solution in original post

2 Replies
Not applicable
Author

Hi Rikab,

I will try to explain the process followed in Rob's incremental reload document.

Incremental Setup Tab:

1. The variables are set to be used in the scripts. There is no need to explain the purpose of each variable as it is very clearly explained in the script comment.

2. In the same tab, the 'max date' is obtained if the QVD already exists and has some data. Then this date is used to define the 'Incremental exp' variable which the where class.

Data Load Tab:

This is very straight forward as it loads the data from the source based on the given SQL.

Incremental Update tab:

The update of any changed data occurs in this step. Let us consider that we are loading this for third time and there is already some data in the QVD (60,000 rows). We should check if any data available in the QVD (which is already loaded) is updated, if so, then we have to replace the data in QVD with the new data.

Lets consider that the current run has loaded 800 rows and this should be concatenated to the QVD which already has 60000 rows. We should check if any of the 'PK' in new 800 list is already available in QVD and these updated records should be replaced in QVD. So, in order to do that we are loading the entire data from QVD and concatenating with the new data with the condition "WHERE NOT exists(PK, $(PKexp))". So, this will load only the records from QVD which are not present in the new list. This ensures that all updated records are also loaded and finally stored in the QVD.

So, every time when we run this document, all the rows (60000 from QVD + 800 from DB will be loaded). This is because we have to do an update. Hence, loading all rows means that it loads only the latest record from DB and remaining from QVD. A load from QVD will be always faster when compared to DB load and hence this incremental load will be much faster than full reload.

We cannot skip loading all rows from existing QVD, as the update cannot happen without this. Hope this clears your doubt of loading all the records again.

Regards,

Haneesh

Not applicable
Author

Hi Haneesh!

Many thanks for explaining me the process. I have understood the process not but able to under how the script works(though it is commented). Anyway I will go through the script once again and will let you know which part is not understandable.