Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a question regarding the most optimal design for a fact table in a data model.
What should be the goal, regarding rows vs columns? Which works best with big/medium data amounts? How does QV handle the compression of the information in RAM and HDD?
For example, which model would have the best performance:
Key | Value | Amount |
1 | Field 1 | 100 |
2 | Field 2 | 150 |
3 | Field 3 | 50 |
4 | Field 4 | 123 |
5 | Field 5 | 100 |
6 | Field 6 | 232 |
7 | Field 7 | 151 |
8 | Field 8 | 75 |
9 | Field 9 | 23 |
10 | Field 10 | 200 |
etc..
or
Key | Field 1 | Field 2 | Field 3 | Field 4 | Field 5 | Field 6 | Field 7 | Field 8 | Field 9 | Field 10 |
1 | 100 | 150 | 50 | 200 | 250 | 50 | 75 | 100 | 25 | 55 |
2 | 100 | 150 | 50 | 200 | 250 | 50 | 75 | 100 | 25 | 55 |
3 | 100 | 150 | 50 | 200 | 250 | 50 | 75 | 100 | 25 | 55 |
4 | 100 | 150 | 50 | 200 | 250 | 50 | 75 | 100 | 25 | 55 |
etc. ??
Thanks,
My opinion is that unless you have a strong reason to treat field names as data values, you'd want to use the second approach. It's much simpler and clearer to do this:
sum(Field9)
Than this:
sum({<Value={'Field9'}>} Amount)
But of course it all depends. Like let's say what you really wanted was a chart for the user that looked like the second table you show. It would be easier to build that chart like this with the first model:
Dimension 1 = Key
Dimension 2 = Value // moved to the top, pivot table
Expression = sum(Amount)
Than like this with the second model:
Dimension = Key
Expression 1 = sum(Field1)
Expression 2 = sum(Field2)
Expression 3 = sum(Field3)
etc.
You can also, of course, keep both models of the data at the same time.
I've done all versions - the first model, the second model, and both at the same time. It just depends on what the application requires.
I don't know how the compression of these two versions of the data would compare. If QlikView is using some sort of perfect compression algorithm, they should compress to the same size because they contain the same information. But I'm going to guess that in practice, the first table will take more memory, and be slower for what I'd consider typical data analysis tasks. These are just guesses, though. You're best off doing some testing if a significant factor in your choice is memory and performance.
To me, though, the choice is about how you want the users to interact with the data, how you want your charts to behave. Use the data model that supports your needs, then try to tune it as much as possible if memory and performance are critical.
Thanks for your input.
On 30/03/2012, at 04:44 p.m., John Witherspoon <
If the fields were completely different and should be viewed separately (e.g. Field1=CustomerName, Field2=Address1, Field3=Address2...Field10=ProductName) then I would certainly agree with John and go with option 2. It is likely that any performance gained from the Level1 storage facility (QlikView only stores distinct values once, per field. See attached white paper (page 8)) would be cancelled out by overly-complicated expressions to get back to way things should be.
However, if the fields were related and should really be in one field (e.g. Field1=Jan, Field2=Feb, Field3=Mar...Field12=Dec) then I would always Crosstable() it to get Month and Value. Three main advantages to doing it this way (that I can think of):
Use some common sense - if the data seems like it should be loaded/presented in a certain way then go with your gut and present it that way. I always try to go with the lowest-maintenance option i.e. try not to lock yourself into having to change the qvw every time the data changes slightly.
Hope this helps,
Jason