Smallest performance footprint between duplicated rows vs extra relation
Hi!
I have two solutions for a model and am trying to decide on which would have the smallest performance footprint:
[Order Fact (10M): %ItemKey] > [Item Dim (1M): %ProductGroupKey] > [Product Group Dim (100k)]
[Order Fact (10M): %ItemKey] > [Item Dim (1M): %ProductGroupKey] + [Order Fact (10M): %ItemKey] > [Product Group Dim (1M)]
Basically it stands between having a snowflake or a star schema, an extra relation, vs duplicating data in [Product Group Dim] so I can join it directly to [Order Fact].
My gut tells my that having that option 1 has the smallest footprint on performance.
You are correct that having option 1 with a snowflake schema is likely to have a smaller performance footprint than option 2 with a star schema.
In a snowflake schema, the fact table is connected to dimension tables through intermediate tables, resulting in a more normalized data structure. This can help to reduce data redundancy and improve data consistency, which can lead to faster query performance.