
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Row oriented vs Column oriented data model approach
Hi,
I have a question regarding the most optimal design for a fact table in a data model.
What should be the goal, regarding rows vs columns? Which works best with big/medium data amounts? How does QV handle the compression of the information in RAM and HDD?
For example, which model would have the best performance:
Key | Value | Amount |
1 | Field 1 | 100 |
2 | Field 2 | 150 |
3 | Field 3 | 50 |
4 | Field 4 | 123 |
5 | Field 5 | 100 |
6 | Field 6 | 232 |
7 | Field 7 | 151 |
8 | Field 8 | 75 |
9 | Field 9 | 23 |
10 | Field 10 | 200 |
etc..
or
Key | Field 1 | Field 2 | Field 3 | Field 4 | Field 5 | Field 6 | Field 7 | Field 8 | Field 9 | Field 10 |
1 | 100 | 150 | 50 | 200 | 250 | 50 | 75 | 100 | 25 | 55 |
2 | 100 | 150 | 50 | 200 | 250 | 50 | 75 | 100 | 25 | 55 |
3 | 100 | 150 | 50 | 200 | 250 | 50 | 75 | 100 | 25 | 55 |
4 | 100 | 150 | 50 | 200 | 250 | 50 | 75 | 100 | 25 | 55 |
etc. ??
Thanks,


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My opinion is that unless you have a strong reason to treat field names as data values, you'd want to use the second approach. It's much simpler and clearer to do this:
sum(Field9)
Than this:
sum({<Value={'Field9'}>} Amount)
But of course it all depends. Like let's say what you really wanted was a chart for the user that looked like the second table you show. It would be easier to build that chart like this with the first model:
Dimension 1 = Key
Dimension 2 = Value // moved to the top, pivot table
Expression = sum(Amount)
Than like this with the second model:
Dimension = Key
Expression 1 = sum(Field1)
Expression 2 = sum(Field2)
Expression 3 = sum(Field3)
etc.
You can also, of course, keep both models of the data at the same time.
I've done all versions - the first model, the second model, and both at the same time. It just depends on what the application requires.
I don't know how the compression of these two versions of the data would compare. If QlikView is using some sort of perfect compression algorithm, they should compress to the same size because they contain the same information. But I'm going to guess that in practice, the first table will take more memory, and be slower for what I'd consider typical data analysis tasks. These are just guesses, though. You're best off doing some testing if a significant factor in your choice is memory and performance.
To me, though, the choice is about how you want the users to interact with the data, how you want your charts to behave. Use the data model that supports your needs, then try to tune it as much as possible if memory and performance are critical.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your input.
On 30/03/2012, at 04:44 p.m., John Witherspoon <

.png)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the fields were completely different and should be viewed separately (e.g. Field1=CustomerName, Field2=Address1, Field3=Address2...Field10=ProductName) then I would certainly agree with John and go with option 2. It is likely that any performance gained from the Level1 storage facility (QlikView only stores distinct values once, per field. See attached white paper (page 8)) would be cancelled out by overly-complicated expressions to get back to way things should be.
However, if the fields were related and should really be in one field (e.g. Field1=Jan, Field2=Feb, Field3=Mar...Field12=Dec) then I would always Crosstable() it to get Month and Value. Three main advantages to doing it this way (that I can think of):
- More efficient (see doc attached)
- Expandable. Let's say the column names weren't months (which is obviously finite at 1-12) but product category (potentially infinite). If you LOAD * from the field-based layout and crosstable() it to Category and Value then if your data suddenly has an extra category/column added to it you will not need to make any changes to your script or your chart objects as all values would be picked up. If you kept it with the field layout you would need to add every new product category into the charts etc.
- Much better UI for users. All values could be in a single listbox, or in a single dimension on a chart. Much easier to select than having to go through multiple fields
Use some common sense - if the data seems like it should be loaded/presented in a certain way then go with your gut and present it that way. I always try to go with the lowest-maintenance option i.e. try not to lock yourself into having to change the qvw every time the data changes slightly.
Hope this helps,
Jason
