8 Replies Latest reply: Mar 5, 2010 5:46 PM by tylerwaterfall RSS

    CONCATENATE: Load strategy for concatenating several large tables together with shared and unshared fields?

      I'm currently creating a Master_Table including Sales and Inventory metrics by concatenating 9 tables which share 5 key fields and have 12 other different fields. The source tables are either Sales or Inventory, with each set of tables sharing their metrics (e.g. Sales metrics). The resultant table is 350 million records and the Data Model that concatenates this table (and does other loads and joins) takes 80 minutes to load and is 1.7GB highly compressed.

      Our challenge is to meet at 8am deadline every morning. Creating this Master Table is the slow part.

      Is there a strategy on the order that I should be loading these tables? Below is a sample of what we currently do (load/concat Sales first, then Inventory). I'm wondering if I should change the load order (largest QVD loaded first, or Inventory first?) or if I could load an empty table with all fields first to get the 'skeleton' out there. I've noticed that QV slows way down when, after loading the first 5 tables (all Sales with common fields), it tries to concat load the Inventory table (different fields).

      Any insights would be appreciated! (We use QV 8.5, Server: 128 GB, four quad core processor x64).
      Thanks! Gracias!

      Master_Table:
      LOAD
      ApplicationDate,
      _ApplicationDate,
      _ApplicationSkuNumber,
      [_Business Unit Number],
      [_Price Type Code],
      Sales_Amt,
      Sales_Qty,
      AMU,
      Cost,
      Return_Amt,
      ...
      FROM Sales_Table1.qvd (qvd);

      CONCATENATE (Master_Table)

      LOAD
      ApplicationDate,
      _ApplicationDate,
      _ApplicationSkuNumber,
      [_Business Unit Number],
      [_Price Type Code],
      OnHand_Amt,
      OnHand_Qty,
      ATS_Amt,
      ATS_Qty,
      ...
      FROM INVT_Table1.qvd (qvd);

      <<repeat for all Inventory tables and sales tables>>

        • CONCATENATE: Load strategy for concatenating several large tables together with shared and unshared fields?
          John Witherspoon

          I'd be guessing and testing just like you. I wouldn't think the order of concatenation would make a difference, but maybe it does. Maybe it takes extra work to "go back and append null fields" to the table you have so far. Or it might not take any work whatsoever, depending on how QlikView behaves internally. It does seem at least worth testing loading an empty skeleton table with all of the possible fields, just to see if there IS a slowdown when new fields are added. It looks like you already have optimized QVD loads with no conditions. Anything you can do to eliminate or shrink fields might help. I just don't have many ideas.

          • CONCATENATE: Load strategy for concatenating several large tables together with shared and unshared fields?
            Rob Wunderlich

            The slowdown you are seeing from the common Sales to Inventory is the switch between an optimized load (indicated by "(optimized") message in the status window). The INVT_Table1.qvd will load unoptimized which is considerably slower.

            Here's my empirical understanding of optimized loads when concatenating QVDs. The rule is: A concatenated qvd load will be optimized if it loads all the same fields loaded in all previous loads of the same table. It may add fields as well.

            Consider these examples.

            QvdA.qvd -- Fields A, X
            QvdB.qvd -- Fields A, B

            --Optimized load --
            LOAD A FROM QvdA.qvd;
            LOAD A, B FROM QvdB.qvd;

            --UnOptimized load -- because X does not appear in the second load.
            LOAD A, X FROM QvdA.qvd;
            LOAD A, B FROM QvdB.qvd;

            Here's some possible workarounds:

            1. Change to a linked table model instead of a concatenated model -- which looks appropriate given your data sample. You'll have to generate compound keys to eliminate the synthetic keys.


            2. Populate null() fields into the QVDs so the field lists are symmetrical.

            -Rob

              • CONCATENATE: Load strategy for concatenating several large tables together with shared and unshared fields?

                Rob,

                I can validate your response -- here is a smaller version of the load (in progress -- still has 45 minutes to go): you can see the first Sales tables load optimized and then the inventory (Invt) loads are not optimized. There's a good 2 minute pause in the transition and the data load slower afterwards. Based on your observation, looks like I lucked out in loading the larger Sales tables first so they would load 'optimized'.

                Concerning the linked table, I have tried that for another project. Though it does link things together, the performance is significantly worse with that data model. As our applications have increased in size and complexity, more and more we find better performance (front-end) with one Master Table that is a concatenation of disparate tables.

                I will see if I can populate with null values so all tables load with the same data.

                Thank you for sharing your insights!

                • CONCATENATE: Load strategy for concatenating several large tables together with shared and unshared fields?
                  John Witherspoon

                   


                  Rob Wunderlich wrote:1. Change to a linked table model instead of a concatenated model -- which looks appropriate given your data sample. You'll have to generate compound keys to eliminate the synthetic keys.


                  Or just allow QlikView to build the synthetic key, which I suspect here would perform the exact same function as a compound key, but with less effort and higher performance.

                  Edit: Posted that before I saw your response to Rob. I'd at least check the performance using synthetic keys, but based on your experience, you'll probably get your best performance out of your current plan to add null fields to the various QVDs.