We've got a table of about 23.5 million rows which represent 18 months (rolling) of data. There are about 80k new or updated rows every day. Using Rob Wunderlich's example, I have created the following script. But it runs really slow and in the end it fails to update the QVD. I'd appreciate it if someone could take a quick look and let me know if anything looks wrong.
The primary problem I'm having is that the new QVD being written only contians the new/updated records. It's not merging in the existing records from the QVD. So my file starts at 1.5GB in size but ends up as only 9MB.
I think I've got it working properly now (log contains no errors), but I'm including the revised script if you have a few minutes to look it over. The record delete portion is still commented out - does it look correct to you? It creates a 1.5GB QVD file that contains about 22 million records. It takes about 3 hours to run (25 minutes just to get the Max Date from the QVD).
// The following script is based on the Incremental Reload script by Rob Wunderlich
CONNECTTO [Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=Fg_DataMart;Data Source=occ01db055r;Use Procedure for Prepare=1;Auto Translate=True;Packet Size=4096;Workstation ID=BOI03LT073;Use Encryption for Data=False;Tag with column collation when possible=False];
Glad you got it working as far as getting correct results. On performance, I'll offer a couple suggestions.
- You are probably spending a lot of time generating the PK. Also the PKexp in the WHERE NOT exists() is preventing an optimized load of the QVD. It may be better if you generate the PK only on the new rows and save the PK in the QVD. If you save PK, you can't use autonumber(). You'll have to use something like hash160() which increase the size of your QVD but may be worth the tradeoff. With a saved PK, the WHERE expression is just "WHERE NOT exists(PK)". Exists() with a single expression allows for optimized load.
- If you are only running the update once or a few times a day, you might consider another approach to the maxdate problem. I think sweeping the data to get maxdate is the most accurate technique, but it comes at a performance cost. Perhaps you could use QvdCreateTIme() with a day resolution. That is, select based on date only, don't worry about time. You may duplicate some selects day to day, but the PK will keep them out of the QVD.
- If possible, don't use CONCATENATE in the QVD merge. It's not necessary unless you are adding fields. I've seen cases where CONCATENATE seemed to prevent an optimized load.
- The AddMonths() to roll off older data looks correct, but leave it out for the moment because it will prevent an optimized load. If saving the PKs as suggested gives you an optimized load, it may be better to do the rolloff in a subsequent load against the RESIDENT file.
Good luck. Let us know how it turns out. (I'm taking off for a week but will check when I return).