There are two Swedish car brands, Volvo and SAAB. Or, at least, there used to be... SAAB was made in Trollhättan and Volvo was – and still is – made in Gothenburg.
Two fictive friends – Albert and Herbert – live in Trollhättan and Gothenburg, respectively. Albert drives a Volvo and Herbert drives a SAAB.
If the above information is stored in a tabular form, you get the following three tables:
Logically, these tables form a circular reference: The first two tables are linked through City; the next two through Person; the last and the first through Car.
Further, the data forms an anomaly: Volvo implies Gothenburg; Gothenburg implies Herbert; and Herbert implies SAAB. Hence, Volvo implies SAAB – which doesn’t make sense. This means that you have ambiguous results from the logical inference - different results depending on whether you evaluate clockwise or counterclockwise.
If you load these tables into QlikView, the circular reference will be identified and you will get the following data model:
To avoid ambiguous results, QlikView marks one of the tables as “loosely coupled”, which means that the logical inference cannot propagate through this table. In the document properties you can decide which table to use as the loosely coupled table. You will get different results from the logical inference depending on which you choose.
So what did I do wrong? Why did I get a circular reference?
It is not always obvious why they occur, but when I encounter circular references I always look for fields that are used in several different roles at the same time. One obvious example is if you have a table listing external organizations and this table is used in several roles: as Customers, as Suppliers and as Shippers. If you load the table only once and link to all three foreign keys, you will most likely get a circular reference. You need to break the circular reference and the solution is of course to load the table several times, once for each role.
In the above data model you have a similar case. You can think of Car as “Car produced in the city” or “Car that our friend drives”. And you can think of City as “City where car is produced” or “City where our friend lives”. Again, you should break the circular reference by loading a table twice. One possible solution is the following:
In real life circular references are not as obvious as this one. I once encountered a data model with many tables where I at first could not figure out what to do, but after some analyzing, the problem boiled down to the interaction between three fields: Customers, Machines and Devices. A customer had bought one or several machines; a device could be connected to some of the machine types – but not to all; and a customer had bought some devices. Hence, the device field could have two roles: Devices that the customer actually had bought; and devices that would fit the machine that the customer had bought, i.e. devices that the customer potentially could buy. Two roles. The solution was to load the device table twice using different names.
Bottom line: Avoid circular references. But you probably already knew that…
Great post, but please may I ask for clarification on one bit of it.
You say in your blog post:
To avoid ambiguous results, QlikView marks one of the tables as “loosely coupled”, which means that the logical inference cannot propagate through this table.
Sorry if I am being stupid, but could you explain what you mean by the the logical inference cannot propagate through this table bit ?
No, you're not being stupid. The post assumes that the reader knows what the Logical Inference engine does.
The Logical Inference engine is the core of QlikView. It evaluates which field values are possible, given the selection. Basically it first evaluates which records are possible, and then the result of the evaluation "propagates" into the next table via the possible values of the key field, and then the next table is evaluated. It is this propagation that is disabled by the loosely coupled table.
as usually your post are more then IMPRESSIVE!!! but for this one I would like to propose different approach... lets keep to my BI Bible written by two persons: Ralph Kimball for Realtional world... and Henric Cronstrom for Acosiative world... and let's stick to idea of modeling world that always aim to star schema for later efficiency of analitics. This is my proposition of code to remodel loop situation into start schema design
///$tab source
[Person-Car_source]: LOAD * INLINE [ Person, Car Albert, Volvo Herbert, SAAB ];
[City-Car_source]: LOAD * INLINE [ Car, City Volvo, Gotheburg SAAB, Trollhattan ]; ///$tab persons tmp1: load distinct Person Resident [Person-Car_source];
Concatenate(tmp1) load distinct Person Resident [Person-City_source];
Dim_Person: load Distinct Person, RowNo() as PersonID Resident tmp1;
map_Person: mapping load Distinct Person, RowNo() as PersonID Resident tmp1;
drop Table tmp1; ///$tab cars tmp1: load distinct Car Resident [Person-Car_source];
Concatenate(tmp1) load distinct Car Resident [City-Car_source];
Dim_Car: load Distinct Car, RowNo() as CarID Resident tmp1;
map_Car: mapping load Distinct Car, RowNo() as CarID Resident tmp1;
drop Table tmp1;
///$tab cities tmp1: load distinct City Resident [Person-City_source];
Concatenate(tmp1) load distinct City Resident [City-Car_source];
Dim_City: load Distinct City, RowNo() as CityID Resident tmp1;
map_City: mapping load Distinct City, RowNo() as CityID Resident tmp1;
drop Table tmp1;
///$tab link table [LinkingTable]: LOAD ApplyMap('map_Person', Person, 0) as PersonID, ApplyMap('map_Car', Car, 0) as CarID, 'Person-Car' as Relation Resident [Person-Car_source];
drop Table [Person-Car_source];
Concatenate LOAD ApplyMap('map_Person', Person, 0) as PersonID, ApplyMap('map_City', City, 0) as CityID, 'Person-City' as Relation Resident [Person-City_source];
drop Table [Person-City_source];
Concatenate LOAD ApplyMap('map_Car', Car, 0) as CarID, ApplyMap('map_City', City, 0) as CityID, 'City-Car' as Relation Resident [City-Car_source];