Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
tanelry
Partner - Creator II
Partner - Creator II

Autonumber Handling for QVD Optimized Load (via Key Storage)

As we know, autonumber keys are not suitable when QVD optimized (superfast) load is needed. Because autonumber keys are not reusable - the counter restarts from 1 at each script run and previously assigned keys won't necessarily match current ones. And adding autonumber function to QVD load statement will kill optimized load.

I would like to present a workaround. Based on my tests the concept is solid, but I may have missed some weakness. You're welcome to test it further.

In short, our goal is to store qvd fact table including autonumber keys and on next load reuse the number keys without recalculating them. In order to do so we store all native key fields (like "ProductID" or "CompanyID & '|' & ProductID") into separate qvd-s (I call it "key storage"). Each of those stored keys is a compact, single-field table of distinct values in load order of last script run. On next load we start with loading the stored keys one by one and autonumber them (this performs very fast on such compact tables). This initializes autonumber sequence(s) to exactly the same state as on previous load, so after that we don't need to perform autonumber on qvd fact table load - we just use the numbers stored last time. Added records (by incremental load) will have autonumber keys continued from last time. Record delete is also okay, while we never drop any value from the key storage fields.

Here is a basic example:
We will start with full load from original datasource and assign autonumber keys where necessary. If more than one autonumbered key field is used, each autonumber should have it's own id. Note that we always load native (not numbered) key together with autonumber, this is temporarily required for key storage and will be dropped in the end of script.

SalesFact:
LOAD
CompanyID,
SalesOrderID,
AutoNumber(CompanyID & '|' & ProductID,'Product') as ProductKey,
CompanyID & '|' & ProductID as ProductKeyStr,
AutoNumber(CompanyID & '|' & CustomerID,'Customer') as CustomerKey,
CompanyID & '|' & CustomerID as CustomerKeyStr,
OrderQty,
Amount
;
SQL SELECT ...
;
// store fact table with autonumbered key fields
store SalesFact into SalesFact.qvd (qvd);
Product:
LOAD
CompanyID,
ProductID,
AutoNumber(CompanyID & '|' & ProductID,'Product') as ProductKey,
CompanyID & '|' & ProductID as ProductKeyStr,
ProductName
;
SQL SELECT ...
;

Customer:
LOAD
CompanyID,
CustomerID,
AutoNumber(CompanyID & '|' & CustomerID,'Customer') as CustomerKey,
CompanyID & '|' & CustomerID as CustomerKeyStr,
CustomerName
;
SQL SELECT ...
;
// after every (key related) table loaded:
// generate key tables: load distinct field values in load order from all tables
ProductKey:
LOAD
fieldvalue('ProductKeyStr',iterno()) as ProductKeyStr
AUTOGENERATE 1
WHILE len(fieldvalue('ProductKeyStr',iterno()))
;
store ProductKey into ProductKey.qvd (qvd);

CustomerKey:
LOAD
fieldvalue('CustomerKeyStr',iterno()) as CustomerKeyStr
AUTOGENERATE 1
WHILE len(fieldvalue('CustomerKeyStr',iterno()))
;
store CustomerKey into CustomerKey.qvd (qvd);

// clean up
drop table ProductKey;
drop table CustomerKey;
drop fields ProductKeyStr, CustomerKeyStr; // native keys no longer needed


Now we have stored QVD fact table (with autonumbered keys) and compact key fields to represent autonumber sequence (load order = autonumber integer).
Next load we begin with key tables, so we need to add following load statements:


// Load all stored key values from QVD and initialize autonumber sequence
// Original order is preserved (order number = autonumber)
// "KeyInit" table(s) must not be dropped until end of script
ProductKeyInit:
load
ProductKeyStr,
autonumber(ProductKeyStr,'Product') as ProductKeyInit
from ProductKey.qvd (qvd)
;
CustomerKeyInit:
load
CustomerKeyStr,
autonumber(CustomerKeyStr,'Customer') as CustomerKeyInit
from CustomerKey.qvd (qvd)
;

At this point we have initialized autonumber sequence(s) and can proceed with normal load process. We add incremental load + qvd load logic, but rest of the script remain the same.
SalesFact:
LOAD
CompanyID,
SalesOrderID,
AutoNumber(CompanyID & '|' & ProductID,'Product') as ProductKey,
CompanyID & '|' & ProductID as ProductKeyStr,
AutoNumber(CompanyID & '|' & CustomerID,'Customer') as CustomerKey,
CompanyID & '|' & CustomerID as CustomerKeyStr,
OrderQty,
Amount
;
SQL SELECT ...
;
Concatenate load * from SalesFact.qvd (qvd); // optimized load
// store fact table with autonumbered key fields
store SalesFact into SalesFact.qvd (qvd);
Product:
LOAD
CompanyID,
ProductID,
AutoNumber(CompanyID & '|' & ProductID,'Product') as ProductKey,
CompanyID & '|' & ProductID as ProductKeyStr,
ProductName
;
SQL SELECT ...
;
Customer:
LOAD
CompanyID,
CustomerID,
AutoNumber(CompanyID & '|' & CustomerID,'Customer') as CustomerKey,
CompanyID & '|' & CustomerID as CustomerKeyStr,
CustomerName
;
SQL SELECT ...
;
// after every (key related) table loaded:
// generate key tables: load distinct field values in load order from all tables
ProductKey:
LOAD
fieldvalue('ProductKeyStr',iterno()) as ProductKeyStr
AUTOGENERATE 1
WHILE len(fieldvalue('ProductKeyStr',iterno()))
;
store ProductKey into ProductKey.qvd (qvd);

CustomerKey:
LOAD
fieldvalue('CustomerKeyStr',iterno()) as CustomerKeyStr
AUTOGENERATE 1
WHILE len(fieldvalue('CustomerKeyStr',iterno()))
;
store CustomerKey into CustomerKey.qvd (qvd);

// clean up
drop table ProductKey;
drop table CustomerKey;
drop fields ProductKeyStr, CustomerKeyStr; // native keys no longer needed;


Finally drop the init tables loaded in the beginning of script
;

drop table ProductKeyInit;
drop table CustomerKeyInit;


So we have built a "key storage" layer around the normal script. It adds complexity but could be worth the effort where memory and performance is critical.

Few important points:
* Never drop any value from stored key table, unless redoing full (non-qvd) load from datasource
* Avoid NULL values and extra spaces in native keys
* Each application that stores QVD fact tables should have it's own key storage. For example you can't use keys generated by sales qvw in finance qvw.
* Keys that are created during join load where order is not well defined - I'm not yet sure how it affects but I *guess* it is okay. Needs attention, though.

Feedback?

7 Replies
Not applicable

Hello Tanel,

this sounds good and is worth to be looked at more closer. Give us some time for testing.

Regards, Roland

Not applicable

Sounds good. Have you had any problem using that?

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

It's very creative and I like the idea. However, it sound very fragile. If the key storage gets out of synch, like for example when moving a qvd from prod to test, or restoring from a backup, the model breaks.

I generally do optimized loads and then create the autonumbered keys using a join resident for the entire model at load time. Have you tried the timing of that technique?

-Rob

Not applicable

Hi Rob,

I didnt get how you do the autonumbered keys using a join resident. Can you post some code snnipet? However, this should be the last load, without storing the autonumbered key in a qvd file, right?

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Yes, the autonumbering is done last, after the qvds are loaded. The join looks like this:

SalesFact:

LOAD * FROM Sales.qvd (qvd);

JOIN (SalesFact)

LOAD DISTINCT

          CustomerID,

          CompanyID,

          AutoNumber(CustomerID & '|' & CompanyID, 'Customer') as %SalesKey

RESIDENT SalesFact

;

Important note -- the DISTINCT keyword will cause the SalesFact table to be DISTINCT -- no duplicate rows. So this technique only works if the rows have guaranteed uniqueness, like a promary key.

-Rob

http://robwunderlich.com

Michael_Reese
Employee
Employee

I have found that it is faster to load the qvd into a temp table first, then do a noncatenate load * into the final table with the autonumber field.  Then drop the temp table.

tmpSalesFact:

LOAD <list fields explicitly>

FROM Sales.qvd (qvd);

SalesFact:

LOAD

     AutoNumber(CustomerID & '|' & CompanyID, 'Customer') as %SalesKey,

     *

RESIDENT tmpSalesFact

;


Drop table tmpSalesFact;

Anonymous
Not applicable

 

You can do a sequencial load:

SalesFact:
load
AutoNumber(CustomerID & '|' & CompanyID, 'Customer') as %SalesKey,
*
; LOAD <list fields explicitly> FROM Sales.qvd (qvd);

In my test, it's the fastest way (greater than left join or load temp table), even it's not a optimezed load.