Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I'm currently creating a Master_Table including Sales and Inventory metrics by concatenating 9 tables which share 5 key fields and have 12 other different fields. The source tables are either Sales or Inventory, with each set of tables sharing their metrics (e.g. Sales metrics). The resultant table is 350 million records and the Data Model that concatenates this table (and does other loads and joins) takes 80 minutes to load and is 1.7GB highly compressed.
Our challenge is to meet at 8am deadline every morning. Creating this Master Table is the slow part.
Is there a strategy on the order that I should be loading these tables? Below is a sample of what we currently do (load/concat Sales first, then Inventory). I'm wondering if I should change the load order (largest QVD loaded first, or Inventory first?) or if I could load an empty table with all fields first to get the 'skeleton' out there. I've noticed that QV slows way down when, after loading the first 5 tables (all Sales with common fields), it tries to concat load the Inventory table (different fields).
Any insights would be appreciated! (We use QV 8.5, Server: 128 GB, four quad core processor x64).
Thanks! Gracias!
Master_Table:
LOAD
ApplicationDate,
_ApplicationDate,
_ApplicationSkuNumber,
[_Business Unit Number],
[_Price Type Code],
Sales_Amt,
Sales_Qty,
AMU,
Cost,
Return_Amt,
...
FROM Sales_Table1.qvd (qvd);
CONCATENATE (Master_Table)
LOAD
ApplicationDate,
_ApplicationDate,
_ApplicationSkuNumber,
[_Business Unit Number],
[_Price Type Code],
OnHand_Amt,
OnHand_Qty,
ATS_Amt,
ATS_Qty,
...
FROM INVT_Table1.qvd (qvd);
<<repeat for all Inventory tables and sales tables>>
The slowdown you are seeing from the common Sales to Inventory is the switch between an optimized load (indicated by "(optimized") message in the status window). The INVT_Table1.qvd will load unoptimized which is considerably slower.
Here's my empirical understanding of optimized loads when concatenating QVDs. The rule is: A concatenated qvd load will be optimized if it loads all the same fields loaded in all previous loads of the same table. It may add fields as well.
Consider these examples.
QvdA.qvd -- Fields A, X
QvdB.qvd -- Fields A, B
--Optimized load --
LOAD A FROM QvdA.qvd;
LOAD A, B FROM QvdB.qvd;
--UnOptimized load -- because X does not appear in the second load.
LOAD A, X FROM QvdA.qvd;
LOAD A, B FROM QvdB.qvd;
Here's some possible workarounds:
1. Change to a linked table model instead of a concatenated model -- which looks appropriate given your data sample. You'll have to generate compound keys to eliminate the synthetic keys.
2. Populate null() fields into the QVDs so the field lists are symmetrical.
-Rob
I'd be guessing and testing just like you. I wouldn't think the order of concatenation would make a difference, but maybe it does. Maybe it takes extra work to "go back and append null fields" to the table you have so far. Or it might not take any work whatsoever, depending on how QlikView behaves internally. It does seem at least worth testing loading an empty skeleton table with all of the possible fields, just to see if there IS a slowdown when new fields are added. It looks like you already have optimized QVD loads with no conditions. Anything you can do to eliminate or shrink fields might help. I just don't have many ideas.
The slowdown you are seeing from the common Sales to Inventory is the switch between an optimized load (indicated by "(optimized") message in the status window). The INVT_Table1.qvd will load unoptimized which is considerably slower.
Here's my empirical understanding of optimized loads when concatenating QVDs. The rule is: A concatenated qvd load will be optimized if it loads all the same fields loaded in all previous loads of the same table. It may add fields as well.
Consider these examples.
QvdA.qvd -- Fields A, X
QvdB.qvd -- Fields A, B
--Optimized load --
LOAD A FROM QvdA.qvd;
LOAD A, B FROM QvdB.qvd;
--UnOptimized load -- because X does not appear in the second load.
LOAD A, X FROM QvdA.qvd;
LOAD A, B FROM QvdB.qvd;
Here's some possible workarounds:
1. Change to a linked table model instead of a concatenated model -- which looks appropriate given your data sample. You'll have to generate compound keys to eliminate the synthetic keys.
2. Populate null() fields into the QVDs so the field lists are symmetrical.
-Rob
I am testing the skeleton load approach and it appears to be doing the same as Rob has proposed -- subsequent loads are not optimized because no one table I;m loading has all the 'final' fields.
Thanks for your response.
Rob,
I can validate your response -- here is a smaller version of the load (in progress -- still has 45 minutes to go): you can see the first Sales tables load optimized and then the inventory (Invt) loads are not optimized. There's a good 2 minute pause in the transition and the data load slower afterwards. Based on your observation, looks like I lucked out in loading the larger Sales tables first so they would load 'optimized'.
Concerning the linked table, I have tried that for another project. Though it does link things together, the performance is significantly worse with that data model. As our applications have increased in size and complexity, more and more we find better performance (front-end) with one Master Table that is a concatenation of disparate tables.
I will see if I can populate with null values so all tables load with the same data.
Thank you for sharing your insights!
Rob Wunderlich wrote:1. Change to a linked table model instead of a concatenated model -- which looks appropriate given your data sample. You'll have to generate compound keys to eliminate the synthetic keys.
Or just allow QlikView to build the synthetic key, which I suspect here would perform the exact same function as a compound key, but with less effort and higher performance.
Edit: Posted that before I saw your response to Rob. I'd at least check the performance using synthetic keys, but based on your experience, you'll probably get your best performance out of your current plan to add null fields to the various QVDs.
I was under the impression that synthetic keys are bad, both to performance and to keeping the data straight. Is that a true statement?
I'm curious now to know how synthetic keys would affect data integrity (and performance.) I've always avoided them.
Tyler Waterfall wrote:I was under the impression that synthetic keys are bad, both to performance and to keeping the data straight. Is that a true statement?
While a very good rule of thumb, particularly for new developers, that is not a true statement.
Most of the time, synthetic keys arise through mistakes or inefficiencies in a data model. As a result, synthetic keys usually occur in concert with poor performance and other problems. Clearing up the synthetic keys often simultaneously clears up the data model mistakes and inefficiencies. The poor performance and other problems go away. This can lead to mistaken conclusions, that either synthetic keys CAUSE problems, or that they ONLY occur when there are problems, and that they should therefore always be avoided.
But synthetic keys are not, in and of themselves, a bad thing. While the situations are likely rare, in some situations, they are exactly what you want, and the most efficient way to accomplish a task. I give an example with performance testing results in the following thread:
http://community.qlik.com/forums/p/23510/89770.aspx#89770
For your particular case, I'm guessing (I could certainly be wrong) that a synthetic key linking together all of your tables would be faster than building a concatenated key yourself, both for load speed and to a lesser extent for chart processing speed.
However, for your particular case, the fastest and I would think best solution seems likely to be Rob's second suggestion - to add null fields to your QVDs.
Thank you John for the clarification.