Skip to main content
Announcements
Live today at 11 AM ET. Get your questions about Qlik Connect answered, or just listen in. SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Bobi
Partner - Contributor II
Partner - Contributor II

Link table or canonical dates in heavy fact-table, performance considerations?

Hi. I have a heavy fact table with 12 M unique records. Otherwise it is a star schema and everything is loaded with allmost all logic in db-views and then qvds are created. I have orderline-data in one table and then I joined in subscription data from another fact-table. This is also done in the db views.

Now I want to be able to slice and dice based on 3 different dates (orderline created date, orderline completed date and subscription date).

I have tried 2 approaches.

Approach 1: Huge fact table, no link table.
Canonical date (will be made in db through union): 

fact:


Load*,

order_created_date as %Date,

'Crtd' as Date_Type

from X_table;


concatenate(fact):


Load*,

order_completed_date as %Date,

'Cmpltd' as Date_Type

from X_table;

 

concatenate(fact):


Load*,

subscription_date as %Date,

'Sub' as Date_Type

from X_table;


This is linked to a master calendar.

2nd approach: Link table.

Create link table. Then it is just 12 M rows in fact table instead of (12 M*3= 36 M). But we have a link table with 36 M rows. 

I use incremental load in both approaches.


I felt that the frontend-objects were slower when using the link-table. But there will be higher cost and pressure on the db by using 1st approach and have 36 M rows fact table. I guess I need to choose between higher db-cost and longer reload times (approach 1) or slower end-user experience (approach 2).


What would be your approach, do I miss any factor that is worth considering here regarding performance or anything else I can do to improve this?

Regards /

 

 

Labels (2)
2 Replies
Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Hmm, it's a tough "would you rather" choice...

This might be a long shot, but what if you extracted all different combinations of the three dates - I bet there would be less than 12 millions of those, and created a unique key for each combination of the three? Then, you could load your "calendar" table for each date within each distinct combination.

So, in your Fact table, you'd have a Date ID that's pointing to a particular combination of the three dates, and in your calendar table you would have three separate rows with the same Date ID and the three dates, qualified the same way you did in your option 1?

This way, you avoid tripling the size of your Fact table, or generating a unique key with 12 millions of distinct records.

For more performance tuning techniques, join me at the Masters Summit for Qlik on November 14-16 in New Orleans! 

Cheers,

marcus_sommer

Personally, I would create the data-model with the concatenated fact-table because it's the simplest approach to develop and maintain such data-model and it will have most likely the best performance within the UI. At least much better as a link-table approach.

In regard to the needed storage only the db will need around three-times on space because the classical sql data-storage is mainly linear - means the twice number of records leads to a doubled storage-requirments.

But in Qlik only distinct field-values are stored within the system-tables and then bit-stuffed linked to the data-tables which means the relationship between records and needed storage/RAM isn't linear else rather exponentially. Therefore, the bigger the dataset is the lesser is the needed storage per record - at least within the most common BI scenarios. Very important is here the number of distinct field-values, see: The Importance Of Being Distinct - Qlik Community - 1466796.

Going further with this thought it could mean that it might be more suitable to transfer the most ETL work from the db to Qlik. There would be not a real technically limitation for the move else more administrative considerations how to divide the sources, administration and technically/human resources for the reporting-tasks.

- Marcus