Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
hic
Former Employee
Former Employee

In a well visited post on the community forum, John Witherspoon some time ago asked “Should We Stop Worrying and Love the Synthetic Key?” John’s post begins: “Synthetic keys have a bad reputation. The consensus seems to be that they cause performance and memory problems, and should usually or even always be removed. I believe that the consensus is wrong.” Here’s my view on this topic.

The creation of synthetic keys is simply QlikView’s way of managing composite keys. There is nothing strange or magic around it.

A single key is easy to manage: Just list all unique values in a symbol table (see Symbol Tables and Bit-Stuffed Pointers), and then link the data tables using a natural join.

Source table view.png

But a composite key is slightly different – there is no single symbol table that contains the relevant combinations of the multiple key fields. So QlikView needs to create such a table for all combinations: the $Syn table. In a way, you can say that the $Syn table is a symbol table for composite keys. In the data tables, the multiple keys are replaced by an identifier that uniquely identifies the combination of the values of the original keys: the $Syn key.

Hence, if you have the same set of multiple keys in two or more tables, the QlikView synthetic keys create a general, correct, compact and efficient solution. Synthetic keys do not per se cause performance and memory problems. They do not use a lot more memory than if you autonumber your own concatenated key. And they treat NULLs correctly, as opposed to an explicit concatenated key.

Hence: The synthetic key is in itself good and we should all love it.

Internal table view.png

However…  still, also I avoid synthetic keys. Why?

  1. A synthetic key is in my experience often a sign of a poorly designed data model. I say that, given the number of times I have found a synthetic key in the table viewer only to realize that I made a mistake in the script. If you get a synthetic key and didn’t expect it, I can only say: Back to the drawing board! You should most likely change your data model.
  2. QlikView creates an additional table (the $Syn table) that in many cases is superfluous: An additional table is the best solution if none of the data tables by itself completely spans the set of composite keys. But in real life, there is usually one table that contains all relevant combinations of the keys, and then this table can be used to store the clear text of the individual keys.
  3. For clarity, I like to create my own concatenated keys. It forces me to think and create a data model that I believe in. Removing the synthetic keys becomes a method to ensure a good data model, rather than a goal in itself.

But in principle, I totally agree with John’s initial conclusion: Any problem around synthetic keys is really a data modeling problem and not a problem with the synthetic key itself.

The short answer to John’s question is Yes and No. Yes, we should love the synthetic key. But, No, we should not stop worrying. We should always be alert and ask ourselves: “Do I want this synthetic key? Is the data model OK?

And so, because of the automated and irrevocable data-modeling process which rules out human meddling, the Synthetic Keys are scaring. But they are simple to understand. And completely credible and convincing.

Dr HIC

Further reading on Qlik data modelling:

Circular References

Fan traps and Chasm traps

36 Comments
Not applicable

Hi,

I have always trained my colleagues not to care about it unless it produces strange results.

Normally Qliktech was teaching to solve the synthetic keys at the beginning of the data-modelling process. However since I'm using the library of Rob Wunderlich Qlikview Components to help me with synthetic keys, I can solve the synthetic keys at the end of the data-modelling process.

Instrumental in this is the function Qvc.LinkTable which has the ability to solve the synthetic keys at the end, instead of the beginning of the data-modelling process.

I hope with this post it will bring it to a bigger audience.

Loops however are a bigger concern of me. I hope you can elaborate more on that in a future blog post.

Kind Regards,


Dion

14,548 Views
rbecher
MVP
MVP

Hi Henric,

I think a concatenated key will consume more memory because of it's higher cardinality. One recommended best practice is to split fields to save memory (eg. timestamps). This would lead into a syn key if it's a key field. How to decide?

Another question: Is the calculation effort of a link table comparable to a synkey table?

- Ralf

0 Likes
14,548 Views
Gabriel
Partner - Specialist III
Partner - Specialist III

HI HIC,

Personally this is very rich knowledge. Thanks for sharing your knowledge as always.

One thing that sprouts to mind is, could there be hidden agenda for Synthetic Key (table) that is not documented or a useful way of using synthetic key (table) and QlikTech has yet to make known?

Best Regards,

Gabriel


14,548 Views
hic
Former Employee
Former Employee

Ralf

I don't think that an autonumbered concatenated key will consume more memory: They should have identical cardinality. So I would not split fields that are keys.

A $Syn table is like a link table with only one key. The inference engine would treat these two cases identically. But a link table usually has several keys and with that usually many more records, and this will affect the calculation effort.

Gabriel

No, there is nothing hidden with the synthetic key. It is merely a necessity for the QlikView inference engine in order to treat composite keys correctly.

However, one aspect, which isn't obvious (but it is not hidden), is NULL value handling: Should records of different tables be linked if one of several linking keys have a NULL value and the other keys have matching values? My answer is: "No. NULL should never link." And this is also how the synthetic keys work. A manually created, concatenated key will usually link the records, though.

HIC

14,548 Views
Not applicable

Sir,

     You enlightened me about $Syn tables. Thank You.

0 Likes
14,548 Views
Not applicable

Very well explained. I agreed, someone who needs a Syn Key indeed has a bad written data model, always we can find a better solution.

Thanks Henric.

0 Likes
14,548 Views
Anonymous
Not applicable

Good write up. Thanks Henric.

0 Likes
12,207 Views
robert99
Specialist III
Specialist III

Thanks for this post

I used one (and only one) synthetic key to join the three main files at work (using the personal edition). When we purchased the SB server edition the consultant work employed to check my work (I was self taught) and set up the server replaced the synthetic key with a concatenated key (to improve performance). I preferred the synthetic key (personal preference) but both worked. But the file size increased noticeable. Over 10% from 26000KB to 30000KB 

Unsure of the memory impact though

0 Likes
12,207 Views
Not applicable

Are there performance statistics that support this?  I'd be curious to see if a large data model with a synthetic key table will perform as efficiently as one with a concatenated (and autonumbered) key.  Has this been tested?  If so, what were the findings?

0 Likes
12,207 Views
hic
Former Employee
Former Employee

My experience is the same as RJ's above. If the data model is OK (e.g. a single synthetic key between two tables) then the synthetic key performs as well as (or better than) a manually created key. And often at slightly less memory usage.

But as soon as you have a questionable data mode (e.g. synthetic keys created from synthetic keys) then the performance deteriorates drastically.

However, I have no formal tests to support it...

HIC

0 Likes
12,207 Views