Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
valpassos
Creator III
Creator III

Performance of multiple master calendars

Hi community,

I have the need in my model to create muitple master calendars, as I have many dates. I'm talking about 7 master calendars.

I'm loading them and LEFT KEEPing them with my FACTS table. Like this:

LEFT KEEP(FACTS)

CreationCalendar:
LOAD 
&ID_Date AS &ID_Creation_Calendar
...
FROM Calendar;

LEFT KEEP(FACTS)

AssignmentCalendar:
LOAD 
%ID_Date AS %ID_Assignment_Calendar
...
FROM Calendar;

 

But this is taking forever to load, like almost 20 minutes, just to load these tables. The application is very slow and I'm wondering if it has anything to do with this?

Do master calendars hinder performance?

Thanks in advance!

Labels (2)
1 Solution

Accepted Solutions
marcus_sommer

My suggestion was not to limit the calendar (regardless in which way it's applied) to the available dates of the fact-tables which a keep/join/map or where-clause would do. Like mentioned have fact-tables often gaps in regard to a consecutive date - and this might not only happens to the "classical" missing dates like weekend and holidays else there might be also "normal" dates without a sale/production/billing/shipping caused through any reason.

Even if all dates are available within the calendars it could be quite difficult to show these gaps but without it's impossible. I don't know any real case in which you could get disadvantages through dimension-dates which aren't also fact-dates which couldn't handled - a classical case would be the count of dates but here you could use set analysis to consider the source, for example with something: count({< Sales = {'*'}>} date) which would only count dates which have a sale.

The more different date-fields you want to include the more likely would it be that they would have a different dataset within their calendars.

In the end it means that no real dependencies between the dimension- and fact-tables must be implemented. Just loading the calendar-dates between a manually set min/max-date. Usually the min-date of the fact is known and mustn't be read in any way and the max-date might be also known respectively it could be easily calculated with something like yearend(today()) or similar approaches.

With it you would also save the 2 minutes because the unoptimized load (because of the where-clause to the min/max-date - whereby even that could be bypassed with an appropriate pre-load which creates the needed dates to a where exists() clause) of 7 calendars would be finished in a few seconds.

- Marcus

View solution in original post

16 Replies
jcdatasax
Contributor III
Contributor III

Can you load a sample data set with the date fields? I currently use 3 canonical calendars on a data set over 49 million rows. It works great. You shouldn’t need to join the calendars but rather use a canonical calendar, using after mapping to join date fields into common date. 

valpassos
Creator III
Creator III
Author

Hi  @jcdatasax ,

My dataset is close to that as well, ~50 million rows, but it keeps growing day by day.

Can you share a sample of that solution of yours? How does that work for the dates in filters?

Thanks,

Lisa 

jcdatasax
Contributor III
Contributor III

thkarner
Partner - Creator III
Partner - Creator III

If you want to use derived Autocalendar fields on date fields in a large fact table it might be very slow. The solution is to create a new table with a single date field and a join field to the date in the fact table. The new table only contains the distinct values per date (without duplicates).

Then you can apply the autocalendar on the new date field in the new table. It will extremly speed up the loading time of the app.

My technical explanation is:
Whenever the derived calendar fields are used, a complete reload of the table happens. In case of a large fact table it could take time. Using a small compact date table joined with the fact table is very quick.

- Thomas

 

valpassos
Creator III
Creator III
Author

Hi @thkarner ,

Thanks for the input! I'm not really interested in using autocalendar derived fields, though. Why would I use that instead of master or canonical calendars? My fact table is very large, as I said.

My initial question was if there was a more efficient solution to create the calendars I need instead of creating 7 master calendars and left keeping them with my fact table. Do you know one?

Thanks,

Lisa

 

valpassos
Creator III
Creator III
Author

Hi @jcdatasax ,

But what does that have to do with the canonical calendars solution you mentioned? It seems to me they are two very different solutions.

Appreciate the clarification 🙂

Lisa

thkarner
Partner - Creator III
Partner - Creator III

Hi, not sure if I understood you fully. Please let me replicate:

a) You have a large fact table with 7 date fields.

b) You want to create a master calendar for each date field.

You have to options:

a) Create your own master calendar table with all necessary fields (Year, Quarter, Month, etc.)

b) Create your own master calendar table with date field only and derive other fields (Year, Quarter, Month, etc.). This is what I´d suggest (according to https://community.qlik.com/t5/Qlik-Sense-Documents-Videos/Derived-Calendar-Fields/ta-p/1485683)

In both cases you have to create a new table for each date field.

Instead of LEFT KEEP you can try like this

CalendarDate1:
LOAD DISTINCT Date1 RESIDENT Facts;

Since DISTINCT is not very quick you can improve by loading Min/Max values for Date1 in an intermediate table and autogenerate all dates between Min and Max in a new Calendar table.

If Min/Max is still too slow also the FieldValue Function I´ve already used to detect min and max.

Finally the goal is to create a new master calendar table with the distinct dates only to ensure performance of the app.

If you´re familiar with scripting I´d suggest using the min/max approach. If this is not quick enough try to find out how it could be done with FieldValue function.

 

 

 

marcus_sommer

I think your approach had two serious issues. One is limiting a calendar per keep respectively per join (the match-logic is the same and it differs only in the number of tables which remain) which means here that each date from the calendar must be checked against each record from the fact-table - and by 50 M of records and 7 calendars it will of course take some time.

Much more performant would be to use a where-clause with exists, like:

CreationCalendar:
LOAD * FROM Calendar where exists(&ID_Creation_Calendar, &ID_Date);

because here is the calendar-date compared against the distinct values of the system-table of the fact-table date and not against the whole fact-table.

But the second mentioned issue remained by this method - you won't probably have all calendar-dates between the min- and max-date of the fact-table because there are usually gaps from weekends, holidays and similar breaks. Often these missing dates are needed for various calculations and views within the UI.

Therefore I recommend to create a single master-calendar within a separate qvd which contains all possible calendar-fields of the last/future years/decades. Then in each application you could load the needed/specific fields with the needed time-span - which could mostly be set manually because the min- and max-date are usually known.

The min-date might start with yearstart(YourMinDate) and the max-date is often something as today() or yearend(today()). The means any read of the available dates from the fact-tables aren't really needed - and if it should be done against the system-tables. A good description could you find here: https://qlikviewcookbook.com/2013/09/fastest-method-to-read-maxfield-from-a-qvd/ 

If you have multiple date-fields you could just repeat it for all of them - you need only to rename all calendar-fields properly to avoid synthetic keys and circular loops. In this case would be qualifying quite useful. If your date-fields contain connected events like an order/billing/shipping-date you may create an extra canonical calendar for them. This means the canonical calendar didn't replaced the other calendars else it provides additionally possibilities.

- Marcus

valpassos
Creator III
Creator III
Author

I get it now. Thanks for the clarification, @thkarner! Will give that solution a try 🙂

Lisa