- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Best Data Model - Thoughts?
So like a lot of businesses, we have millions of rows, (circa 20). At present, all of the data gets put into one Fact Table, not keyed, has 170 columns and is slow. My question is this - is it better to have one fact table with 170 columns or split it out into a star schema?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Though the NULL itself would not cause a big problem, but the other factors caused it's creation would. More nulls means information density is less and more number of refrences to a single value in terms of memory pointer. Memory pointers would require to point and store the same value numerous times and that becomes a performance killer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That depends. If the table is mostly populated (information density is high, i.e - less number of nulls) the single table would be a better choice. If the table is actually derived from multiple tables and has got lots of nulls (because of joins or concatenations), might be you have to look back and think of breaking it down.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any chance you could expand on why this is please?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps this blog post helps: A Myth about the Number of Hops.
Where do you experience slow performance? Possible the data model isn't the problem but a chart is. Or one or more expressions that are not optimal for their purpose.
talk is cheap, supply exceeds demand
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you go with many-table way, it gives you normalized model which would have advantages of:
- Memory Efficient - (single table would consume much memory)
- More meaningful when look at the model - you can understand and manage easily
- Risk of wrong calculation reduces - that could appear in the process of joins(improper)
But this is not a such straight-forward case of winning over. There would always be a trade-off between CPU efficiency and Memory efficiency, and de-normalized approach(single table) wins by a small margin there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
but more specifically where NULL values are concerned? a normalised model would win over a one table fact model because ...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Though the NULL itself would not cause a big problem, but the other factors caused it's creation would. More nulls means information density is less and more number of refrences to a single value in terms of memory pointer. Memory pointers would require to point and store the same value numerous times and that becomes a performance killer.