Skip to main content
Announcements
Global Transformation Awards submissions are open! SUBMIT YOUR STORY
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Best Data Model - Thoughts?

So like a lot of businesses, we have millions of rows, (circa 20). At present, all of the data gets put into one Fact Table, not keyed, has 170 columns and is slow. My question is this - is it better to have one fact table with 170 columns or split it out into a star schema?

1 Solution

Accepted Solutions
tresesco
MVP

Though the NULL itself would not cause a big problem, but the other factors caused it's creation would. More nulls means information density is less and more number of refrences to a single value in terms of memory pointer. Memory pointers would require to point and store the same value numerous times and that becomes a performance killer.

View solution in original post

6 Replies
tresesco
MVP

That depends. If the table is mostly populated (information density is high, i.e - less number of nulls) the single table would be a better choice. If the table is actually derived from multiple tables and has got lots of nulls (because of joins or concatenations), might be you have to look back and think of breaking it down.

Not applicable
Author

Any chance you could expand on why this is please?

Gysbert_Wassenaar

Perhaps this blog post helps: A Myth about the Number of Hops.

Where do you experience slow performance? Possible the data model isn't the problem but a chart is. Or one or more expressions that are not optimal for their purpose.


talk is cheap, supply exceeds demand
tresesco
MVP

If you go with many-table way, it gives you normalized model which would have advantages of:

  • Memory Efficient  - (single table would consume much memory)
  • More meaningful when look at the model - you can understand and manage easily
  • Risk of wrong calculation reduces - that could appear in the process of joins(improper)

But this is not a such straight-forward case of winning over. There would always be a trade-off between CPU efficiency and Memory efficiency, and de-normalized approach(single table) wins by a small margin there.

Not applicable
Author

but more specifically where NULL values are concerned? a normalised model would win over a one table fact model because ...

tresesco
MVP

Though the NULL itself would not cause a big problem, but the other factors caused it's creation would. More nulls means information density is less and more number of refrences to a single value in terms of memory pointer. Memory pointers would require to point and store the same value numerous times and that becomes a performance killer.