6 Replies Latest reply: May 7, 2014 1:23 AM by Mike Reese RSS

    Autonumber Handling for QVD Optimized Load (via Key Storage)

    Tanel Rüütli

      As we know, autonumber keys are not suitable when QVD optimized (superfast) load is needed. Because autonumber keys are not reusable - the counter restarts from 1 at each script run and previously assigned keys won't necessarily match current ones. And adding autonumber function to QVD load statement will kill optimized load.

      I would like to present a workaround. Based on my tests the concept is solid, but I may have missed some weakness. You're welcome to test it further.

      In short, our goal is to store qvd fact table including autonumber keys and on next load reuse the number keys without recalculating them. In order to do so we store all native key fields (like "ProductID" or "CompanyID & '|' & ProductID") into separate qvd-s (I call it "key storage"). Each of those stored keys is a compact, single-field table of distinct values in load order of last script run. On next load we start with loading the stored keys one by one and autonumber them (this performs very fast on such compact tables). This initializes autonumber sequence(s) to exactly the same state as on previous load, so after that we don't need to perform autonumber on qvd fact table load - we just use the numbers stored last time. Added records (by incremental load) will have autonumber keys continued from last time. Record delete is also okay, while we never drop any value from the key storage fields.

      Here is a basic example:
      We will start with full load from original datasource and assign autonumber keys where necessary. If more than one autonumbered key field is used, each autonumber should have it's own id. Note that we always load native (not numbered) key together with autonumber, this is temporarily required for key storage and will be dropped in the end of script.

      SalesFact:
      LOAD
      CompanyID,
      SalesOrderID,
      AutoNumber(CompanyID & '|' & ProductID,'Product') as ProductKey,
      CompanyID & '|' & ProductID as ProductKeyStr,
      AutoNumber(CompanyID & '|' & CustomerID,'Customer') as CustomerKey,
      CompanyID & '|' & CustomerID as CustomerKeyStr,
      OrderQty,
      Amount
      ;
      SQL SELECT ...
      ;
      // store fact table with autonumbered key fields
      store SalesFact into SalesFact.qvd (qvd);
      Product:
      LOAD
      CompanyID,
      ProductID,
      AutoNumber(CompanyID & '|' & ProductID,'Product') as ProductKey,
      CompanyID & '|' & ProductID as ProductKeyStr,
      ProductName
      ;
      SQL SELECT ...
      ;

      Customer:
      LOAD
      CompanyID,
      CustomerID,
      AutoNumber(CompanyID & '|' & CustomerID,'Customer') as CustomerKey,
      CompanyID & '|' & CustomerID as CustomerKeyStr,
      CustomerName
      ;
      SQL SELECT ...
      ;
      // after every (key related) table loaded:
      // generate key tables: load distinct field values in load order from all tables
      ProductKey:
      LOAD
      fieldvalue('ProductKeyStr',iterno()) as ProductKeyStr
      AUTOGENERATE 1
      WHILE len(fieldvalue('ProductKeyStr',iterno()))
      ;
      store ProductKey into ProductKey.qvd (qvd);

      CustomerKey:
      LOAD
      fieldvalue('CustomerKeyStr',iterno()) as CustomerKeyStr
      AUTOGENERATE 1
      WHILE len(fieldvalue('CustomerKeyStr',iterno()))
      ;
      store CustomerKey into CustomerKey.qvd (qvd);

      // clean up
      drop table ProductKey;
      drop table CustomerKey;
      drop fields ProductKeyStr, CustomerKeyStr; // native keys no longer needed


      Now we have stored QVD fact table (with autonumbered keys) and compact key fields to represent autonumber sequence (load order = autonumber integer).
      Next load we begin with key tables, so we need to add following load statements:


      // Load all stored key values from QVD and initialize autonumber sequence
      // Original order is preserved (order number = autonumber)
      // "KeyInit" table(s) must not be dropped until end of script
      ProductKeyInit:
      load
      ProductKeyStr,
      autonumber(ProductKeyStr,'Product') as ProductKeyInit
      from ProductKey.qvd (qvd)
      ;
      CustomerKeyInit:
      load
      CustomerKeyStr,
      autonumber(CustomerKeyStr,'Customer') as CustomerKeyInit
      from CustomerKey.qvd (qvd)
      ;

      At this point we have initialized autonumber sequence(s) and can proceed with normal load process. We add incremental load + qvd load logic, but rest of the script remain the same.
      SalesFact:
      LOAD
      CompanyID,
      SalesOrderID,
      AutoNumber(CompanyID & '|' & ProductID,'Product') as ProductKey,
      CompanyID & '|' & ProductID as ProductKeyStr,
      AutoNumber(CompanyID & '|' & CustomerID,'Customer') as CustomerKey,
      CompanyID & '|' & CustomerID as CustomerKeyStr,
      OrderQty,
      Amount
      ;
      SQL SELECT ...
      ;
      Concatenate load * from SalesFact.qvd (qvd); // optimized load
      // store fact table with autonumbered key fields
      store SalesFact into SalesFact.qvd (qvd);
      Product:
      LOAD
      CompanyID,
      ProductID,
      AutoNumber(CompanyID & '|' & ProductID,'Product') as ProductKey,
      CompanyID & '|' & ProductID as ProductKeyStr,
      ProductName
      ;
      SQL SELECT ...
      ;
      Customer:
      LOAD
      CompanyID,
      CustomerID,
      AutoNumber(CompanyID & '|' & CustomerID,'Customer') as CustomerKey,
      CompanyID & '|' & CustomerID as CustomerKeyStr,
      CustomerName
      ;
      SQL SELECT ...
      ;
      // after every (key related) table loaded:
      // generate key tables: load distinct field values in load order from all tables
      ProductKey:
      LOAD
      fieldvalue('ProductKeyStr',iterno()) as ProductKeyStr
      AUTOGENERATE 1
      WHILE len(fieldvalue('ProductKeyStr',iterno()))
      ;
      store ProductKey into ProductKey.qvd (qvd);

      CustomerKey:
      LOAD
      fieldvalue('CustomerKeyStr',iterno()) as CustomerKeyStr
      AUTOGENERATE 1
      WHILE len(fieldvalue('CustomerKeyStr',iterno()))
      ;
      store CustomerKey into CustomerKey.qvd (qvd);

      // clean up
      drop table ProductKey;
      drop table CustomerKey;
      drop fields ProductKeyStr, CustomerKeyStr; // native keys no longer needed;


      Finally drop the init tables loaded in the beginning of script
      ;

      drop table ProductKeyInit;
      drop table CustomerKeyInit;


      So we have built a "key storage" layer around the normal script. It adds complexity but could be worth the effort where memory and performance is critical.

      Few important points:
      * Never drop any value from stored key table, unless redoing full (non-qvd) load from datasource
      * Avoid NULL values and extra spaces in native keys
      * Each application that stores QVD fact tables should have it's own key storage. For example you can't use keys generated by sales qvw in finance qvw.
      * Keys that are created during join load where order is not well defined - I'm not yet sure how it affects but I *guess* it is okay. Needs attention, though.

      Feedback?