Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
hic
Former Employee
Former Employee

In a well visited post on the community forum, John Witherspoon some time ago asked “Should We Stop Worrying and Love the Synthetic Key?” John’s post begins: “Synthetic keys have a bad reputation. The consensus seems to be that they cause performance and memory problems, and should usually or even always be removed. I believe that the consensus is wrong.” Here’s my view on this topic.

The creation of synthetic keys is simply QlikView’s way of managing composite keys. There is nothing strange or magic around it.

A single key is easy to manage: Just list all unique values in a symbol table (see Symbol Tables and Bit-Stuffed Pointers), and then link the data tables using a natural join.

Source table view.png

But a composite key is slightly different – there is no single symbol table that contains the relevant combinations of the multiple key fields. So QlikView needs to create such a table for all combinations: the $Syn table. In a way, you can say that the $Syn table is a symbol table for composite keys. In the data tables, the multiple keys are replaced by an identifier that uniquely identifies the combination of the values of the original keys: the $Syn key.

Hence, if you have the same set of multiple keys in two or more tables, the QlikView synthetic keys create a general, correct, compact and efficient solution. Synthetic keys do not per se cause performance and memory problems. They do not use a lot more memory than if you autonumber your own concatenated key. And they treat NULLs correctly, as opposed to an explicit concatenated key.

Hence: The synthetic key is in itself good and we should all love it.

Internal table view.png

However…  still, also I avoid synthetic keys. Why?

  1. A synthetic key is in my experience often a sign of a poorly designed data model. I say that, given the number of times I have found a synthetic key in the table viewer only to realize that I made a mistake in the script. If you get a synthetic key and didn’t expect it, I can only say: Back to the drawing board! You should most likely change your data model.
  2. QlikView creates an additional table (the $Syn table) that in many cases is superfluous: An additional table is the best solution if none of the data tables by itself completely spans the set of composite keys. But in real life, there is usually one table that contains all relevant combinations of the keys, and then this table can be used to store the clear text of the individual keys.
  3. For clarity, I like to create my own concatenated keys. It forces me to think and create a data model that I believe in. Removing the synthetic keys becomes a method to ensure a good data model, rather than a goal in itself.

But in principle, I totally agree with John’s initial conclusion: Any problem around synthetic keys is really a data modeling problem and not a problem with the synthetic key itself.

The short answer to John’s question is Yes and No. Yes, we should love the synthetic key. But, No, we should not stop worrying. We should always be alert and ask ourselves: “Do I want this synthetic key? Is the data model OK?

And so, because of the automated and irrevocable data-modeling process which rules out human meddling, the Synthetic Keys are scaring. But they are simple to understand. And completely credible and convincing.

Dr HIC

Further reading on Qlik data modelling:

Circular References

Fan traps and Chasm traps

36 Comments
JoaquinLazaro
Partner - Specialist II
Partner - Specialist II

Hi Henric:

I'm in a little mess, how can I resolve a synthetic key generated by an interval match ?

The interval match has been created following your IntervalMatch instructions.

Thank you,

Joaquín

0 Likes
6,680 Views
hic
Former Employee
Former Employee

If it is a simple Intervalmatch with no additional keys, you do not need to remove it. It causes no problem, and it is in fact the most efficient data model.

If it is an extended Intervalmatch - with additional keys - then you should remove it. See Slowly Changing Dimensions.

In either case, you can remove it by first creating a temporary table, and then a composite key from the temporary table. E.g.

Intervals:
Load *,
Key & '|' & FromDate & '|' & ToDate as [Key+Interval]
From Intervals ;

Events:
Load *,
     Key & '|' & Date as [Key+Date]
From Events ;

TmpIntervaMatch:
IntervalMatch (Date,Key)
Load distinct FromDate, ToDate, Key
     Resident Intervals ;

IntervaMatch:
Load
     TmpKey & '|' & Date as [Key+Date],
TmpKey & '|' & FromDate & '|' & ToDate as [Key+Interval]
Resident Tmp_IntervaMatch;


Drop Field Key From Events;
Drop Table TmpIntervaMatch;

HIC

6,680 Views
JoaquinLazaro
Partner - Specialist II
Partner - Specialist II

Thank you Henric.

As usual it has been a very important lesson

0 Likes
6,680 Views
qlikviewwizard
Master II
Master II

Hi hic ,

This is very nice post. Thank you.

0 Likes
6,680 Views
Not applicable

Excellent post!

0 Likes
6,697 Views
Not applicable

So I will re-evaluate this article and appreciate it in its entirety.

But for now, the main helpful part is this:


But a composite key is slightly different – there is no single symbol table that contains the relevant combinations of the


So the difference between a Synthetic Key table, and just "n" lines between two tables, is whether one of the two tables contains all the compound keys?


In other words, these tables will do "n" lines, because first table contains all compound keys (key fields are bolded)

[Company, Year, Employee Count

Widget, 2014, 10

Foobar, 2015, 20]


[Company, Year, Location

Widget, 2014, Swidgherzland]


Whereas these two tables need synthetic key, because the second table has a compound key  which the first table does not? (italicized)

[Company, Year, Employee Count

Widget, 2014, 10

Foobar, 2015, 20]


[Company, Year, Location

Widget, 2014, Swidgherzland,

Acme, 2013, Acmerica]

0 Likes
6,697 Views
Not applicable

Oh I made a comment that was confused -- the "difference" between "n" lines joining tables and a synthetic key is in fact not a real difference in data storage, just a difference in how it is represented to the user -- the user can change it using "Internal Table View" vs "Source View" in the Table Viewer dropdown:

same data different representation.png

0 Likes
6,697 Views
Not applicable

Tim

You raised a very important topic I had the feeling but without a clear picture before I saw your post.

I agreed with you and seems Qlikview's advantage of in memory association make it perform some unexpected result in the situation you mentioned.

I also came from the traditional DW experience and tend to think things in the old way and see if we can resolve the issue in the old way.

My understanding is that:

In case there are multiple data marts and the conformed dimensions of these data marts are different, for example, the mart you mentioned in the diagram is using two dimension, another fact linked to LHA may only using one dimension.

In this case the FILTER_IN_MART_A_IS_AFFECTING_FILTERS_IN_MART_B happens.

Really I don't see there is any perfect solution in Qlikview, several workarounds I can think of are:

1. Try to make the different data marts using the same number of conformed dimensions, in this case, make the other fact also link to the date dimension;

2. We all know this date dimension here is a fake dimension, but we have no choice;

3. In the date dimension, add one inclusive item such as '<ALL>';

4. create a bridge table for date dimension so all the dates happened in the other fact are having <ALL>;

5. now whenever you select a specific date in mart A, the LHA in mart B won't be affected.

A simple prototype is as below:

FACT1:
LOAD * INLINE
[
DATE_KEY, COUNTRY, VALUE1
2014_ALL, CHINA ,100
2015_ALL, CANADA ,200
2016_ALL ,USA ,300
];

FACT2:
LOAD * INLINE
[
DATE_KEY, COUNTRY, VALUE2
ALL_ALL, CHINA, 300
ALL_ALL, CANADA, 4000
ALL_ALL, USA ,555
];

DATE_LINK:

LOAD DATE_KEY,IF(YEAR='ALL',NULL(),YEAR) AS YEAR, YEAR_GROUP INLINE
[
DATE_KEY, YEAR, YEAR_GROUP
2014_ALL, 2014, ALL
2015_ALL, 2015, ALL
2016_ALL, 2016, ALL
ALL_ALL, ALL, ALL
ALL_ALL,    2014,   ALL
ALL_ALL,    2015,  ALL
ALL_ALL,    2016, ALL
];

6,697 Views
zippo_btl
Contributor II
Contributor II

If i understand right, syn keys make all combination of involved fields, so the app with syn key should consume more memory?

As i've seen in someone's example  - 1000 men, 1000 products, 1000 regions synthetic key. The length of the syn table would be 1000*1000*1000 = 1 billion unique values. However in real life the length of the link table would be much less, cause not every man buys every product every day? So creating manual composite key is a must in this virtual situation.

Can someone confirm this?

Thx

0 Likes
6,697 Views
hic
Former Employee
Former Employee

No, the Synthetic key does not create the Cartesian product of the involved fields.

If you have two tables - A and B - and they are linked by two fields - X and Y, then the synthetic key will contain all combinations of X and Y that exist in A, plus all combinations that exist in B. If you create a solution manually, then this will most likely contain exactly the same number of combinations as the synthetic key.

HIC

6,697 Views