Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi there,
I load the following files into a QV Master File:
Daily Account Extract (about 500,000 records, growing daily). I replace this daily with a NEW file.
Daily Transaction Extracts. I load all historical extracts. Currently about 3 million records. Older extracts tored as QVD, to which I concatenate new daily files. Then I update the QVD file every month or so to keep my script load time short.
Daily ApplicationExtracts. I load all historical extracts. Currently about 500,000records. Older extracts tored as QVD, to which I concatenate new daily files. Then I update the QVD file every month or so to keep my script load time short.
The QVD's run quickly - the most of the script run time is spent on the account extract.
My question:
Can I somehow store a QVD of the account extract, then replace the extract daily, but when I run the script, it does an insert and replace type thing.
So it will add all new records (new accounts) that are NOT in the QVD, and it will replace all fields that are different to that of the QVD file.
Possible?
Thanks.
Gerhard
Hi,
This is normally done with a LastModifiedDate and ID in your data:
//first load modified records from the DB - this will get new and updated rows
Data:
LOAD
ID,
Blah,
Blah2,
Blah3,
; SQL....
WHERE LastModifiedDate > $(vLastUpdated);
//now load data from your QVD file that isn't in the above load
Concatenate (Data)
LOAD * FROM Data.qvd(qvd) WHERE NOT EXISTS(ID);
//wrote back to the qvd file
STORE Data INTO Data.qvd;
Hope this helps,
Jason
Hi Jason, thanks for the response.
Don't think I explained properly though:
I don't load directly from a database. I am still trying to get the entire company on board with QlikView.
Our system does a batch run every morning and generates reports, which are mostly text files.
The Account Extract, Transaction and Application extracts are text files, which I then copy to my local drive and my script loads from there.
So the account extract is basically a daily snapshot of our client base. For old exisiting accounts, the data might change daily (obviously the account number and most of the customer detail will remain constant, but a lot of fields will change daily, like "last payment date" and "amount in arrears" etc.). And accounts opened the previous day will be new records in the extract.
I replace the account extract every day and load the newest one - so I don't really have a "last update" or "last modified" to look at.
Maybe I'm misunderstanding you though - I'm still learning all this.
Hi gerhardl.
Then I think you need to do as Jason pointed out.
Since the Account details change you need to run it every day.
Maybe, if possible, you can load only the changed Accounts and load them into QV first.
ie.
Accounts:
LOAD ChangedAccounts;
CONCATENATE (Accounts)
LOAD AllAccounts Where NOT Exists (AccountID);
Or maybe it's just quicker to do a total load every day, you'll have to verify that.
And also, my understanding is that you're doing a proof of concept kind of application, so maybe it's not cruical if the load takes a bit longer. In the real world, when loading from database, it is very convenient to implement an incremental load just as Jason pointed out.
Kind regards
BI Consultant
Hi Magnus,
I have no way of identifying the changed accounts myself.
Okay - I think I will stick to loading the full, replaced file every day for now.
Thank you both
There's always a way....
I have worked with this type of situation before - where records change but I have no way of knowing which ones have. So, I built a concatenated string of every field that is likely to change, wrapped it in Hash128() to keep it smaller (you may need to use Hash192() as your recordset is quite large) and used this is my marker for the EXISTS() function:
//This is your INITIAL, FULL load - just done once to create your initial QVD file
Data:
LOAD
Hash128(ChangingField1 & '/' & ChangingField2 & '/' & ChangingField3 & '/'...etc) AS KEY,
ID,
Blah,
Blah2,
Blah3,
FROM your text file;
//write to the qvd file
STORE Data INTO Data.qvd;
Now onto the daily work...
//first, load the QVD file into memory
Data:
LOAD * FROM Data.qvd(qvd);
//Then, load NEW and UPDATED records from the text file
Concatenate (Data)
LOAD
Hash128(ChangingField1 & '/' & ChangingField2 & '/' & ChangingField3 & '/'...etc) AS KEY,
ID,
Blah,
Blah2,
Blah3,
1 AS NewOrUpdatedFlag
FROM your text file
WHERE NOT EXISTS(KEY,Hash128(ChangingField1 & '/' & ChangingField2 & '/' & ChangingField3 & '/'...etc));
//so, now you have all yesterday's records, plus any new and changed records .
//HOWEVER,changed records now exist in both their old AND new incarnations.
//So, we must now delete the old records
<This bit I haven't worked out yet and have run out of time! Will complete later if you haven't worked it out before me! Trying to avoid another RESIDENT load with ORDER BY as that will be slow...>
//wrote back to the qvd file
STORE Data INTO Data.qvd;
Hope this helps,
Jason
I like the optimism!
That looks insanely complicated though... especially with the size of my script as it is.
It looks like this:
//Account Extract:
Load <about 150 fields>
From <my text file which I replace daily with the newest file. It is possible that ANY field can change, except for the Account No> ;
//Transaction Extracts:
TranQVD:
Load * From TranQVD.qvd(qvd);
Concatenate(TranQVD)
LOAD <about 30 fields>
FROM <a folder full of text files>
Where exists( [Account No]);
//Funeral Policy Holder Extracts:
LOAD <about 20 fields>
FROM <a text file I also replace every day - small file so I'm happy to do full load daily>;
//Application Extracts:
AppsQVD:
Load * From AppsQVD.qvd (qvd);
concatenate(AppsQVD)
LOAD <about 90 fields>
FROM <a folder full of text files>;
If you figure it out ot would really help me if you can edit the above to show what it should look like - I'm going to see how far I get with your above comments.
In the Account Extract, Transaction Extract and FPH Extract the [Account No] will be KEY field.
In the Account Extract and Application Extract the [Appl No] is a KEY field.