Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Help with Incremental Reload

We've got a table of about 23.5 million rows which represent 18 months (rolling) of data. There are about 80k new or updated rows every day. Using Rob Wunderlich's example, I have created the following script. But it runs really slow and in the end it fails to update the QVD. I'd appreciate it if someone could take a quick look and let me know if anything looks wrong.

LET SCRIPT_START=now(1);

LET PKexp=autonumberhash128(Branch_Plant,SKU,Lot_Code,Location,Transaction_Date,Transaction_Time);

IF filesize('QVD/Detail_Inventory_Transactions_Last2_FY.qvd') > 0 THEN

SET QVD_EXISTS=1;

LET QVD_ROWS_BEFORE=QvdNoOfRecords('QVD/Detail_Inventory_Transactions_Last2_FY.qvd');

ELSE

SET QVD_EXISTS=1; // Always make True becasue this part isn't working for some reason.

LET QVD_ROWS_BEFORE=QvdNoOfRecords('QVD/Detail_Inventory_Transactions_Last2_FY.qvd');

END IF

IF $(QVD_EXISTS) THEN

GetLastDateTemp:

Directory;

LOAD Record_Update_Date FROM QVD/Detail_Inventory_Transactions_Last2_FY.qvd (qvd);

LOAD max(Record_Update_Date) as MaxLastUpdateDate RESIDENT GetLastDateTemp;

LET LastUpdateDate = MaxLastUpdateDate;

LET MaxLastDate = text(date(fieldValue('MaxLastUpdateDate', 1),'MM/DD/YY hh:mm tt'));

DROP table GetLastDateTemp;

ELSE

LET MaxLastDate = '4/1/2009';

END IF

DIT:

SQL SELECT *

FROM Detail_Inventory_Transactions

WHERE "Record_Update_Date" >= '$(MaxLastDate)'

;

IF $(QVD_EXISTS) THEN

RIGHT JOIN (DIT) LOAD

*,

$(PKexp) as PK

RESIDENT DIT;

CONCATENATE (DIT) LOAD * FROM QVD/Detail_Inventory_Transactions_Last2_FY.qvd (qvd)

WHERE NOT exists(PK, $(PKexp));

DROP FIELD PK;

END IF

STORE DIT INTO QVD/Detail_Inventory_Transactions_Last2_FY.qvd;

DROP TABLE DIT;

LET LOAD_DURATION=now(1) - SCRIPT_START;

LET QVD_ROWS_AFTER=QvdNoOfRecords('QVD/Detail_Inventory_Transactions_Last2_FY.qvd');

5 Replies
Anonymous
Not applicable
Author

I forgot to add:

The primary problem I'm having is that the new QVD being written only contians the new/updated records. It's not merging in the existing records from the QVD. So my file starts at 1.5GB in size but ends up as only 9MB.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Briain,

I don't see anything obvious that would explain why you're not merging. Can you tell from the log if the IF block that contains the RIGHT JOIN and the CONCATENATE is being executed?

It would be helpful if you posted the document log. Can you post it here?

-Rob

Anonymous
Not applicable
Author

Thanks for the reply Rob!

I think I've got it working properly now (log contains no errors), but I'm including the revised script if you have a few minutes to look it over. The record delete portion is still commented out - does it look correct to you? It creates a 1.5GB QVD file that contains about 22 million records. It takes about 3 hours to run (25 minutes just to get the Max Date from the QVD).

- Brian

// The following script is based on the Incremental Reload script by Rob Wunderlich

CONNECT TO [Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=Fg_DataMart;Data Source=occ01db055r;Use Procedure for Prepare=1;Auto Translate=True;Packet Size=4096;Workstation ID=BOI03LT073;Use Encryption for Data=False;Tag with column collation when possible=False];

Directory QVD/;

LET SCRIPT_START=now(1);

SET QVDFILE=Detail_Inventory_Transactions_Last2_FY.qvd;

SET PKexp=autonumberhash128(Branch_Plant,SKU,Lot_Code,Location,Transaction_Date,Transaction_Time);

IF FileSize('$(QVDFILE)') > 0 THEN

SET QVD_EXISTS=1;

Let QVD_ROWS_BEFORE=QvdNoOfRecords ('$(QVDFILE)');

ELSE

SET QVD_EXISTS=0;

END IF

IF $(QVD_EXISTS) THEN

LET GETMAXDATE_START=now(1);

maxdate:

LOAD max(Record_Update_Date) as maxdate

FROM $(QVDFILE) (qvd);

LET GETMAXDATE_END=now(1)-GETMAXDATE_START;

LET vMaxDate = date(fieldValue('maxdate', 1),'MM/DD/YYYY hh:mm:ss');

DROP table maxdate;

END IF

IF $(QVD_EXISTS) THEN

SET vSQLQuery = SELECT * FROM Detail_Inventory_Transactions WHERE Record_Update_Date >= '$(vMaxDate)';

ELSE

SET vSQLQuery = SELECT * FROM Detail_Inventory_Transactions WHERE Gl_Date >= '09/01/2007 00:00:00';

END IF

LET GETNEWDATA_START=now(1);

TempTable:

SQL $(vSQLQuery);

LET GETNEWDATA_END=now(1)-GETNEWDATA_START;

IF $(QVD_EXISTS) THEN

LET GETALLDATA_START=now(1);

RIGHT JOIN (TempTable)

LOAD

*,

$(PKexp) as PK

RESIDENT TempTable;

CONCATENATE (TempTable)

LOAD

*

FROM $(QVDFILE) (qvd)

WHERE NOT exists(PK, $(PKexp));

//AND Record_Update_Date >= AddMonths(today(),-24);

DROP FIELD PK;

LET GETALLDATA_END=now(1)-GETALLDATA_START;

END IF

LET STOREDATA_START=now(1);

STORE TempTable INTO $(QVDFILE);

LET STOREDATA_END=now(1)-STOREDATA_START;

DROP TABLE TempTable;

LET LOAD_DURATION=now(1) - SCRIPT_START;

LET QVD_ROWS_AFTER=QvdNoOfRecords ('$(QVDFILE)');

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Brian,

Glad you got it working as far as getting correct results. On performance, I'll offer a couple suggestions.

- You are probably spending a lot of time generating the PK. Also the PKexp in the WHERE NOT exists() is preventing an optimized load of the QVD. It may be better if you generate the PK only on the new rows and save the PK in the QVD. If you save PK, you can't use autonumber(). You'll have to use something like hash160() which increase the size of your QVD but may be worth the tradeoff. With a saved PK, the WHERE expression is just "WHERE NOT exists(PK)". Exists() with a single expression allows for optimized load.

- If you are only running the update once or a few times a day, you might consider another approach to the maxdate problem. I think sweeping the data to get maxdate is the most accurate technique, but it comes at a performance cost. Perhaps you could use QvdCreateTIme() with a day resolution. That is, select based on date only, don't worry about time. You may duplicate some selects day to day, but the PK will keep them out of the QVD.

- If possible, don't use CONCATENATE in the QVD merge. It's not necessary unless you are adding fields. I've seen cases where CONCATENATE seemed to prevent an optimized load.

- The AddMonths() to roll off older data looks correct, but leave it out for the moment because it will prevent an optimized load. If saving the PKs as suggested gives you an optimized load, it may be better to do the rolloff in a subsequent load against the RESIDENT file.

Good luck. Let us know how it turns out. (I'm taking off for a week but will check when I return).

-Rob

Not applicable
Author

I have implemented your code it is working fine my side.

thanks to you.

can you put some light on -

what is the purpose of this -

Directory QVD/;

is it required?