Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
hi,
I have a problem where I need to link 2 huge qvd's files, the issue breaks down as:
1. I need to analyze an inventory over time, exploiting a QVD from 15 warehouses together with the purchases QVD, so I need to link them through the itemId, warehouseid and date. Each QVD has between 100 and 50 million rows.
2. If I link them through a composite key, I lose QVD optimized load.
3. I've tried concatenating them, which loads in super-fast mode, but it's getting complicated because I have to add an additional field with information not related to each table, so I can filter through that and know what's what.
I'm thinking that the probable solutions are:
a. Use synthetic tables, but this conveys a lot of perfomance issues.
b. Create a composite key , do one complete reload and then every day do a partial load which only adds the more recent rows.
c. Continue concatenating and investigate some other way of identifying the source table.
What do you guys suggest?
I am not sure that I fully understand your question/problem but how about a combination of alternative "b" and "c"?
Load all records every sunday night or something like that let these qvd:s include all composite keys so you can load optimized. Then do partial reloads where you create the same composite keys. Then just concatenate the four tables (two history qvd:s and two "new transaction" table)
When you initially create and store the QVD's, why don't you create the composite key at the same time? Then when you load from it, you can load optimized.
I'd recommend concatenation - any other approaches are more complex and consume more memory. It's easy enough the add a qualifier or a flag to determine "what is what". Besides, if you keep your quantities and amounts separate, it will be easy enough to distinguish...
cheers!
Dealing with complex structures and large data sets, I find myself avoiding composite keys lately... Keeping in mind that memory is consumed by fields with the largest number of distinct values, - imagine how much more memory is needed for a composite key that combines Item Number, Warehouse and Date!
Another specific challenge with this particular example is linking between Inventory and Purchasing data, including the date... The problem is that if those two data sets are linked through a composite key, then selecting a date range might distort inventory picture - when a user selects a month, the inventory data is being limited to those Items that happened to be purchased in that specific month (which is obviously wrong...)
Considering all the advantages and disadvantages , I typically prefer concatenation...
Thanks for your response, Oleg,
Do you have any examples of how you handle this issue? And how do you add identifiers to distinguish what's what's without slowing down the optimized load?
Because you are dealing with QVDs, i would make the "What's what" field in the Qvd creation. With this approach you are creating the whats what fields in the SQL select statements (ill guess you use SQL select to retreive the data) and you can later use qvd optimized mode when loading the qvd files into the application.
I once did a test with concatenated keys where I in test #1 made the keys in the sql statement and in test #2 made them in the load statement (from qvds). SQL is just much faster with concatenation of fields then Qv so you might keep this in mind if you go for a join/keep solution instead of a files concatenation).
Also, I'll would guess that a num(Date) instead of just the Date field alone is better for the concatenated keys (in terms of memory).
Edit:
And of course, use the autonumberhash function if you go for field concatenations.
unfortunately I'm not reading from SQL but from A LOT of text files, so the load isn't so fast. but anyway, I agree that the best solution is to concatenate and to set a data origin flag from the QVD, so today I've the left the first qvd re-creating itself all night from zero. thanks to everyone for the input.
Concatenation does make the loads super fast. However, when I use the QVDs and create charts on non key fields I get wierd results. If I use composite keys and aggregate data on a chart object , I get the correct values for the aggregation.with any field used a s a dimension. However, when i use concatenate, most often my dimension value shows up as NULL and the aggregates are incorrect. I read on Rob Wurlich's blog (http://qlikviewnotes.blogspot.com/2009/11/understanding-join-and-concatenate.html), that concatenates would remove the associations when we try to use a non key field. Is there any way to work around this?