How normalized should the QlikView data model be? To what extent should you have the data in several tables so that you avoid having the same information expressed on multiple rows?
Usually as much as possible. The more normalized, the better. A normalized data model is easier to manage and minimizes the risk of incorrect calculations.
This said, there are occasions where you need to de-normalize. A common case is when the source database contains a generic master table, i.e. a master table that is used for several purposes. For example: you have a common lookup table for customers, suppliers, and shippers. Or you have a master calendar table that is used for several different date fields, e.g. order date and shipping date (see image below).
A typical sign for this situation is that the primary key of the master table links to several foreign keys, sometimes in different parts of the data model. The OrganizationID links to both CustomerID and ShipperID and the Date field links to both OrderDate and ShippingDate. The master table has several roles.
The necessary de-normalization in QlikView is easy. You should simply load the master table several times using different field names, once for every role. (See image below).
However, loading the same data twice is something many database professionals are reluctant to do; they think that it creates an unnecessary redundancy of data and hence is a bad solution. So they sometimes seek a solution where they can use a generic master table also in the QlikView data model. This is especially true for the master calendar table.
If you belong to this group, I can tell you that loading the same table several times is not a bad solution. Au contraire – in my opinion it is the best solution. Here's why:
- From the user’s perspective it is confusing to have an unspecified “Date” field if there are several dates in the data model. For example, the user will not understand whether the date refers to order date or shipping date.
- Without loading the master calendar several times, it will not be possible for the user to make selections that place simultaneous restrictions on several date fields, e.g. “show transactions where the order was placed in September and the items were shipped in November”.
In fact, loading the same table several times in QlikView is no stranger than doing it in SELECT statements using aliases, e.g.,
SELECT OrderID FROM Orders
INNER JOIN MasterCalendar AS OrderCalendar ON Orders.OrderDate=OrderCalendar.Date
INNER JOIN MasterCalendar AS ShippingCalendar ON Orders.ShippingDate=ShippingCalendar.Date
WHERE OrderCalendar.Month=9 AND ShippingCalendar.Month=11
In SQL you would never try to solve such a problem without joining the master table twice. And you should do the same in QlikView.
So, if you have several dates in your data model – load the master calendar several times!
PS. But if you still want one common date field, you should create a Canonical Date.