Qlik Community

QlikView Scripting

Discussion Board for collaboration on QlikView Scripting.

mhassinger
Contributor

Performance w/ multiple fact tables - star schema vs. master link

I am looking at a data model and am trying to help optimize it for performance.

What it looks like now is one dimension table (Geography) in the center, with 6 fact tables spoking out from it. Then connected to those fact tables are other dimension tables. So kind of an odd snowflake schema (dim > fact > dim, instead of fact > dim > dim). Hope that makes sense.

Also, the outer dimension tables are duplicated to avoid circular references. For example, the Product dimension table associated with Fact table 1 has the same fields (renamed) and values as the Product dimension table associated with Fact table 2.

Lastly, 3 of the 6 fact tables are simply pre-aggregated records of the other 3. So for example, if Fact table 1 has data monthly, then Fact table 2 might have the same data quarterly, or as year over year.

Performance is the primary concern here, although a simplfiied data model has it's own development and maintenance benefits.

So my recommendations have been to follow QlkView best practices as I understand them, which are to try and pull the fact tables together to generate a star schema to reduce the number of "hops" to tables QlikView has to make when a user makes a selection.

There's been a little pushback on the concatenation of fact tables, so I offered up the use of generic keys and a master link table as an alternative, which would remove the need to have, for example, the product dimension table repeated several times in the model.

My concern is that the performance increase might not be as dramatic as using a single fact table.

Unfortunately, there has already been some development here, so any of these would be changes and would have to be justified from an application performance (specifically, user experience, not load script execution time).

So would you expect changing to a master link table (keeping 6 fact tables) and removing redundant dimension tables to noticeably improve things? Or would there only be significant improvements from actually reducing the number of tables by concatenating fact tables together into a star schema?

5 Replies

Re: Performance w/ multiple fact tables - star schema vs. master link

I recommend concatenating the fact tables. See this blog post if you haven't already: Fact Table with Mixed Granularity

Having a product dimension duplicated several times doesn't sound like a good idea if it doesn't represent distinctly different roles.


talk is cheap, supply exceeds demand
mhassinger
Contributor

Re: Performance w/ multiple fact tables - star schema vs. master link

So your thought is that the performance gained from one fact table is worth it over the path of a master link table with generic keys?

Not applicable

Re: Performance w/ multiple fact tables - star schema vs. master link

I have a post at:

I see no need to ever use concatenate, or to use link tables..

I do not create multiple Dimension tables - I just alias column names, similar to other BI tools like OBIEE, or BO.  I see no need to concatenate at all. 

Employee
Employee

Re: Performance w/ multiple fact tables - star schema vs. master link

I agree with Gysbert. My experience is that the performance of a concatenation always beats that of a link table. With or without generic keys.

Also, concatenation of the fact tables will allow you to use the same dimension tables for all facts.

HIC

Not applicable

Re: Performance w/ multiple fact tables - star schema vs. master link

Michael,

if you do a very complex model, it is certainly because you need. If there is no need to do, avoid it. QlikView is very performant, so be sure that the "basic" first model is not the good one.

I would avoid the pre-agregated tables, especially if the computation is just a sum. One again: is your model not good enough ? Did you perform a test with the amount of data you will get (or with random data) to point out that the time response will be too long ?

Last, but not least, think about the interface. If you have two dimensions Product for example, it will be more complicated to handle that fact. The user choose the values in Product1, and you want to display the values by Product2, and all values related to Product2 (QV is an associative tool). The different expressions in the graphs/tables may be far more complicated, difficult to test etc.

Fabrice

Community Browser