Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
rileymd88
Contributor III
Contributor III

Overwriting Rows and Incremental Load

Hi,

I have the following table which will be refreshed daily and will contain years of daily data which means ~30 million records:

table1:

SKUDayKeyQty
A01/01/2016A01/01/201610
A02/01/2016A02/01/20165
A03/01/2016A03/01/20163

I want to load all of the historical data from my database into a QVD and then everyday take the last 3 days of data from my database and add any new records (which I can determine by the Key field) and then if a record comes in during the last 3 days load that already has an existing key in the historical QVD I would like to overwrite that record. I need to do this as the data in the column Qty can change historically over the last 3 days. Can anyone help with this?

Thanks,

R

1 Solution

Accepted Solutions
Nicole-Smith

Here is some "pseudo" code to hopefully get you started...

/* Check if QVD exists, if not load all history from DB */

IF FileTime(YourQVD.qvd)>0 THEN

     /* Load last 3 days from DB */

     LET vMinDate = date(today()-3);

     YourTable:

     SELECT *

     FROM YourDatabaseTable

     WHERE YourDateField >= $(vMinDate);

     /* Load only things not updated from QVD */

     CONCATENATE (YourTable)

     LOAD *

     FROM YourQVD.qvd

     WHERE not exists(YourKeyField);

   

ELSE

     /* Load everything from database */

     YourTable:

     SELECT *

     FROM YourDatabaseTable;

ENDIF

STORE * FROM YourTable INTO YourQVD.qvd (qvd);

View solution in original post

8 Replies
Nicole-Smith

Here is some "pseudo" code to hopefully get you started...

/* Check if QVD exists, if not load all history from DB */

IF FileTime(YourQVD.qvd)>0 THEN

     /* Load last 3 days from DB */

     LET vMinDate = date(today()-3);

     YourTable:

     SELECT *

     FROM YourDatabaseTable

     WHERE YourDateField >= $(vMinDate);

     /* Load only things not updated from QVD */

     CONCATENATE (YourTable)

     LOAD *

     FROM YourQVD.qvd

     WHERE not exists(YourKeyField);

   

ELSE

     /* Load everything from database */

     YourTable:

     SELECT *

     FROM YourDatabaseTable;

ENDIF

STORE * FROM YourTable INTO YourQVD.qvd (qvd);

rileymd88
Contributor III
Contributor III
Author

This works fine for my first case of only adding the new records, however how do I deal with records that need to be overwritten? In this case you could imagine my key of SKU&Day would exist in the stored QVD already but the Qty would be different and therefore I would need to take this record and add it into my stored QVD and ensure the old record is overwritten with the new one.

Nicole-Smith

The old record will be "rewritten" because of the "WHERE not exists" clause.  It will not load the old record from the QVD, so it will be "rewritten" by what was pulled from the database.

rileymd88
Contributor III
Contributor III
Author

Thank you I read too quickly the first time around this is perfect. Thanks!

jhstrange
Contributor II
Contributor II

Hey Nicole,really nice solution.But,I need your help in implementing your solution for our case.

In our case,the historical data would be loaded into a QVD as a one-time activity and we will get "last 30 days" feed every day.

For example,Let us assume our historical data contains data from Jan 1st 2016 to Dec 27th,2018.

On Jan 27th,2019 the API feed would give us data for the time period Dec 28th,2018 to Jan 27th,2019.On Jan 28th,it would be Dec 29th to Jan 28th data.How do we make sure we are not missing out on Dec 28th data and that we are permanently storing the preceding day's data(28th Dec in this case) along with the historical data at one place?

The problem is,if we store x in abc.qvd and then perform store y in abc.qvd,it will only contain y in the abc.qvd and not both x and y.

Your help would be greatly appreciated!

 

Nicole-Smith

You load your new data from wherever you're getting it from, load the old data from the QVD (concatenating it to the new data), and then store in the QVD.

Something like this:

New_Data:
SELECT Date, Field1, Field2, Field3...
FROM YourSource;

Min_Date:
LOAD MIN(Date) AS Min_Date
RESIDENT New_Data;
LET vMinDate = PEEK('Min_Date', 0, 'Min_Date');
DROP TABLE Min_Date;

CONCATENATE (New_Data)
LOAD Date, Field1, Field2, Field3...
FROM YourQVD
WHERE Date < $(vMinDate);

STORE * FROM New_DATA INTO YourQVD;

If the concatenate piece is taking too long to load, you can either:
1) Use an exists clause instead of the < comparison in order to optimize the load
2) Load the entire thing  without the < comparison into a temp table and then have a second query concatenate it using the < comparison

jhstrange
Contributor II
Contributor II

 Thanks a lot Nicole!!

An example scenario:Assuming 'YourQVD' contains data until 25th Jan,2019.On the 26th Feb,the 30-day pull('YourSource') would retrieve 28th Jan to 26th Feb records.So,we would be missing out on 26th Jan and 27th Jan records.

Maybe the only thing with this implementation is that the historical data needs to be uploaded to 'YourQVD' once every 30 days as 'YourSource' retrieves only the last 30 days data.This way entries for any dates are not missed.

Have I understood it right Nicole?

Nicole-Smith

You could run the code daily.  Added comments inline below to explain what's happening.  You shouldn't be missing any data using this.

//Grab all of the new data out of the database or wherever it's coming from
New_Data:
SELECT Date, Field1, Field2, Field3...
FROM YourSource;

//Find the earliest date in the new data
Min_Date:
LOAD MIN(Date) AS Min_Date
RESIDENT New_Data;
LET vMinDate = PEEK('Min_Date', 0, 'Min_Date');
DROP TABLE Min_Date;

//Pull anything before the earliest date in the new data from the QVD (so if it's not in the new data, then pull it from the QVD)
CONCATENATE (New_Data)
LOAD Date, Field1, Field2, Field3...
FROM YourQVD
WHERE Date < $(vMinDate);

//Store all old and new data (because they are all in one table now) into a QVD
STORE * FROM New_Data INTO YourQVD;