5 Replies Latest reply: Jan 23, 2014 4:17 AM by Fabrice Aunez RSS

    Performance w/ multiple fact tables - star schema vs. master link

    Michael Hassinger

      I am looking at a data model and am trying to help optimize it for performance.

       

      What it looks like now is one dimension table (Geography) in the center, with 6 fact tables spoking out from it. Then connected to those fact tables are other dimension tables. So kind of an odd snowflake schema (dim > fact > dim, instead of fact > dim > dim). Hope that makes sense.

       

      Also, the outer dimension tables are duplicated to avoid circular references. For example, the Product dimension table associated with Fact table 1 has the same fields (renamed) and values as the Product dimension table associated with Fact table 2.

       

      Lastly, 3 of the 6 fact tables are simply pre-aggregated records of the other 3. So for example, if Fact table 1 has data monthly, then Fact table 2 might have the same data quarterly, or as year over year.

       

      Performance is the primary concern here, although a simplfiied data model has it's own development and maintenance benefits.

       

      So my recommendations have been to follow QlkView best practices as I understand them, which are to try and pull the fact tables together to generate a star schema to reduce the number of "hops" to tables QlikView has to make when a user makes a selection.

       

      There's been a little pushback on the concatenation of fact tables, so I offered up the use of generic keys and a master link table as an alternative, which would remove the need to have, for example, the product dimension table repeated several times in the model.

       

      My concern is that the performance increase might not be as dramatic as using a single fact table.

       

      Unfortunately, there has already been some development here, so any of these would be changes and would have to be justified from an application performance (specifically, user experience, not load script execution time).

       

      So would you expect changing to a master link table (keeping 6 fact tables) and removing redundant dimension tables to noticeably improve things? Or would there only be significant improvements from actually reducing the number of tables by concatenating fact tables together into a star schema?