Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
DarkArchonNL
Contributor
Contributor

Best Practices Data Modeling

Good afternoon,

I would like to ask the community for input on something.

I work at a company that preps all the data in datawarehousing tool. This goes as far as constructing the models for Qlik (Sense in this case) to use.

Some models have multiple fact tables.

From my previous experience and understanding of how to construct a model like this is that one can either concatenate (append) the fact tables, or use a link table.

In this case, the latter method is applied. However, it was applied in a way I can't understand the use of and I'd like someone to either tell me I'm missing something, or that I'm correct in my reasoning.

Example data:

Let us, for simplicity, reason with the following three fact tables, each containing the columns amount and productname:
(mind you, concatenating the tables is not an option in the actual scenario)
- BackOrders
- Sales
- Inventory

I would normally construct a link table as follows:

BackOrders
- BackOrder_Amount
- BackOrders_productname

Sales
- Sales_Amount
- Sales_productname

Inventory
- Inventory_Amount
- Inventory_productname

LinkTable
- Inventory_productname
- Sales_productname
- BackOrders_productname
- productname_PK

Dim_productname
- productname_PK
- productname (actual field exposed for filtering)
......

 

Now, the company I worked at, and the engineers working on these models have little to no experience working with Qlik, turned it into 8 table model, including all of the above and adding a bespoke dim table to exclusively filter the productname per fact table.

The possible added benefit of being able to customize the dim tables was not employed, they are all exactly the same, except for, of course, the gross over usage of fully qualified naming in order to avoid synth keys.

Conceptually this is what did this. In reality, though, there are up to 12 of these "bespoke dimension tables" present in the model.

Can anyone please enlighten me, before I start questioning something that is actually common practice? 

Labels (1)
1 Solution

Accepted Solutions
marcus_sommer

In nearly all scenarios is the use of a star-scheme data-model with a single (horizontally + vertically) merged fact-table and n surrounding dimension-tables the most suitable approach. This is related to the simplicity of the data-model and further to the efforts to develop and maintain it as well as to the performance (script and UI) and to the usability for the developers and end-users.

You will need much more efforts and a lot of experiences to create another kind of data-model which will have any benefit to one or two of the above mentioned related aspects - but never to all.

In your described data-model all facts are loaded at least twice - as single fact and in a concatenation to create the link-table. Beside the efforts to create appropriate key-fields could these keys and the link-table become much bigger as all the facts together. Further the most heavy UI calculation is to create the virtual tables behind all the tables/charts/kpi to get the dimensional context for the applied aggregations. The more tables are included with longer and more complex keys the more decreased the performance.

Even more worse is the UI handling with multiple fields for the same content which requires multiple list-boxes for the selections as well as synchronizing the selections with any button/action-stuff and/or extra set analysis relations and/or (nested) if-loops.

Does anything within the last two paragraphs mentioned consequences sounds desirable against just skipping all this detour and applying the officially recommended star-scheme with the least possible efforts? Therefore my suggestion to discard the existing data-model and creating (parallel) a new one from the scratch.

View solution in original post

2 Replies
marcus_sommer

In nearly all scenarios is the use of a star-scheme data-model with a single (horizontally + vertically) merged fact-table and n surrounding dimension-tables the most suitable approach. This is related to the simplicity of the data-model and further to the efforts to develop and maintain it as well as to the performance (script and UI) and to the usability for the developers and end-users.

You will need much more efforts and a lot of experiences to create another kind of data-model which will have any benefit to one or two of the above mentioned related aspects - but never to all.

In your described data-model all facts are loaded at least twice - as single fact and in a concatenation to create the link-table. Beside the efforts to create appropriate key-fields could these keys and the link-table become much bigger as all the facts together. Further the most heavy UI calculation is to create the virtual tables behind all the tables/charts/kpi to get the dimensional context for the applied aggregations. The more tables are included with longer and more complex keys the more decreased the performance.

Even more worse is the UI handling with multiple fields for the same content which requires multiple list-boxes for the selections as well as synchronizing the selections with any button/action-stuff and/or extra set analysis relations and/or (nested) if-loops.

Does anything within the last two paragraphs mentioned consequences sounds desirable against just skipping all this detour and applying the officially recommended star-scheme with the least possible efforts? Therefore my suggestion to discard the existing data-model and creating (parallel) a new one from the scratch.

DarkArchonNL
Contributor
Contributor
Author

Hi marcus_sommer,

Yes, that's pretty much what I thought. Thanks!