Daily Account Extract (about 500,000 records, growing daily). I replace this daily with a NEW file.
Daily Transaction Extracts. I load all historical extracts. Currently about 3 million records. Older extracts tored as QVD, to which I concatenate new daily files. Then I update the QVD file every month or so to keep my script load time short.
Daily ApplicationExtracts. I load all historical extracts. Currently about 500,000records. Older extracts tored as QVD, to which I concatenate new daily files. Then I update the QVD file every month or so to keep my script load time short.
The QVD's run quickly - the most of the script run time is spent on the account extract.
Can I somehow store a QVD of the account extract, then replace the extract daily, but when I run the script, it does an insert and replace type thing.
So it will add all new records (new accounts) that are NOT in the QVD, and it will replace all fields that are different to that of the QVD file.
I don't load directly from a database. I am still trying to get the entire company on board with QlikView.
Our system does a batch run every morning and generates reports, which are mostly text files.
The Account Extract, Transaction and Application extracts are text files, which I then copy to my local drive and my script loads from there.
So the account extract is basically a daily snapshot of our client base. For old exisiting accounts, the data might change daily (obviously the account number and most of the customer detail will remain constant, but a lot of fields will change daily, like "last payment date" and "amount in arrears" etc.). And accounts opened the previous day will be new records in the extract.
I replace the account extract every day and load the newest one - so I don't really have a "last update" or "last modified" to look at.
Maybe I'm misunderstanding you though - I'm still learning all this.
Since the Account details change you need to run it every day.
Maybe, if possible, you can load only the changed Accounts and load them into QV first.
LOAD AllAccounts Where NOT Exists (AccountID);
Or maybe it's just quicker to do a total load every day, you'll have to verify that.
And also, my understanding is that you're doing a proof of concept kind of application, so maybe it's not cruical if the load takes a bit longer. In the real world, when loading from database, it is very convenient to implement an incremental load just as Jason pointed out.
I have worked with this type of situation before - where records change but I have no way of knowing which ones have. So, I built a concatenated string of every field that is likely to change, wrapped it in Hash128() to keep it smaller (you may need to use Hash192() as your recordset is quite large) and used this is my marker for the EXISTS() function:
//This is your INITIAL, FULL load - just done once to create your initial QVD file