Performance w/ multiple fact tables - star schema vs. master link
I am looking at a data model and am trying to help optimize it for performance.
What it looks like now is one dimension table (Geography) in the center, with 6 fact tables spoking out from it. Then connected to those fact tables are other dimension tables. So kind of an odd snowflake schema (dim > fact > dim, instead of fact > dim > dim). Hope that makes sense.
Also, the outer dimension tables are duplicated to avoid circular references. For example, the Product dimension table associated with Fact table 1 has the same fields (renamed) and values as the Product dimension table associated with Fact table 2.
Lastly, 3 of the 6 fact tables are simply pre-aggregated records of the other 3. So for example, if Fact table 1 has data monthly, then Fact table 2 might have the same data quarterly, or as year over year.
Performance is the primary concern here, although a simplfiied data model has it's own development and maintenance benefits.
So my recommendations have been to follow QlkView best practices as I understand them, which are to try and pull the fact tables together to generate a star schema to reduce the number of "hops" to tables QlikView has to make when a user makes a selection.
There's been a little pushback on the concatenation of fact tables, so I offered up the use of generic keys and a master link table as an alternative, which would remove the need to have, for example, the product dimension table repeated several times in the model.
My concern is that the performance increase might not be as dramatic as using a single fact table.
Unfortunately, there has already been some development here, so any of these would be changes and would have to be justified from an application performance (specifically, user experience, not load script execution time).
So would you expect changing to a master link table (keeping 6 fact tables) and removing redundant dimension tables to noticeably improve things? Or would there only be significant improvements from actually reducing the number of tables by concatenating fact tables together into a star schema?
Re: Performance w/ multiple fact tables - star schema vs. master link
if you do a very complex model, it is certainly because you need. If there is no need to do, avoid it. QlikView is very performant, so be sure that the "basic" first model is not the good one.
I would avoid the pre-agregated tables, especially if the computation is just a sum. One again: is your model not good enough ? Did you perform a test with the amount of data you will get (or with random data) to point out that the time response will be too long ?
Last, but not least, think about the interface. If you have two dimensions Product for example, it will be more complicated to handle that fact. The user choose the values in Product1, and you want to display the values by Product2, and all values related to Product2 (QV is an associative tool). The different expressions in the graphs/tables may be far more complicated, difficult to test etc.