Load Data

Report Inappropriate Content · ‎2009-06-11

Today the QlikTech SalesRep passed and an agrement will be reached.

But i still have a question he didn't directly answered to. In our datamart we have a table that containt over 400.000.000 records and growing fast. In our traditional SSAS cube i'm having that data in several partitions and on a dily base we only rebuild the last month. Not realy the last month, a date that is in the sustem that is known as the recentDate. When that date moves with the calendar we have to rebuild an older partition to because we would have a gap in the data.

But, with QV, how is this handled .... Can we also load by a kind of partition. The first load load the history data and a second load of a similar select load the recent data.

And then, on a daily base we unload the second (recent) table and reload it ?

Am i guessing in the right direction ??

Report Inappropriate Content · ‎2009-06-11

I think you can create the qvds for each and load when and where required.This way you have the control of the data.

Thats is just my understanding

Talha

johnw · ‎2009-06-12

It sounds like you want a variation of incremental reloads. You want to keep the older data as it is, then reload the newer data.

In a typical incremental reload (at least in our shop), this is done using a combination of unique keys and a record update timestamp. If the QVD doesn't exist, we load the entire table and store the QVD. If the QVD already exists, we look for the maximum record update timestamp on the QVD, and load in all records updated since that point from the database. We then load the existing QVD when the unique key doesn't already exist in our table. Finally, if the database allows deletes, we inner join back to a load of all keys from the database. Hopefully I got those details right.

It sounds like you want something a bit different. It sounds like you want to partition your data into new and old. The old partition should come directly from the QVD (except when you're rebuilding it), and the new partition should come directly from the database, and these should be combined into a single QVD. I would have two QVDs, an old partition QVD and a complete QVD. If the old partition QVD doesn't exist, load all data from the database that is before the recentDate, and dump it to the old partition QVD. If the old partition QVD DID exist, then just load all the data from the old partition QVD. Next, concatenate all the data from the database that is on or after the recentDate. Now store the whole thing as the complete QVD. So a typical load, then, will do an optimized load from the old partition QVD, and only have to read recent data from the database. When your recentDate moves, delete the old partition QVD manually, and the reload will automatically rebuild it..

There might be a better or other approaches depending on your exact situation. You wouldn't want to do the second option if you reloaded daily, and your recentDate changed daily, as it would be the same effort as just reloading all data on a daily basis.

For an example of an incremental reload, download Rob Wunderlich's QlikView cookbook from robwunderlich.com.

Related Topics