Qlik Community

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

In a well visited post on the community forum, John Witherspoon some time ago asked “Should We Stop Worrying and Love the Synthetic Key?” John’s post begins: “Synthetic keys have a bad reputation. The consensus seems to be that they cause performance and memory problems, and should usually or even always be removed. I believe that the consensus is wrong.” Here’s my view on this topic.

The creation of synthetic keys is simply QlikView’s way of managing composite keys. There is nothing strange or magic around it.

A single key is easy to manage: Just list all unique values in a symbol table (see Symbol Tables and Bit-Stuffed Pointers), and then link the data tables using a natural join.

Source table view.png

But a composite key is slightly different – there is no single symbol table that contains the relevant combinations of the multiple key fields. So QlikView needs to create such a table for all combinations: the $Syn table. In a way, you can say that the $Syn table is a symbol table for composite keys. In the data tables, the multiple keys are replaced by an identifier that uniquely identifies the combination of the values of the original keys: the $Syn key.

Hence, if you have the same set of multiple keys in two or more tables, the QlikView synthetic keys create a general, correct, compact and efficient solution. Synthetic keys do not per se cause performance and memory problems. They do not use a lot more memory than if you autonumber your own concatenated key. And they treat NULLs correctly, as opposed to an explicit concatenated key.

Hence: The synthetic key is in itself good and we should all love it.

Internal table view.png

However…  still, also I avoid synthetic keys. Why?

  1. A synthetic key is in my experience often a sign of a poorly designed data model. I say that, given the number of times I have found a synthetic key in the table viewer only to realize that I made a mistake in the script. If you get a synthetic key and didn’t expect it, I can only say: Back to the drawing board! You should most likely change your data model.
  2. QlikView creates an additional table (the $Syn table) that in many cases is superfluous: An additional table is the best solution if none of the data tables by itself completely spans the set of composite keys. But in real life, there is usually one table that contains all relevant combinations of the keys, and then this table can be used to store the clear text of the individual keys.
  3. For clarity, I like to create my own concatenated keys. It forces me to think and create a data model that I believe in. Removing the synthetic keys becomes a method to ensure a good data model, rather than a goal in itself.

But in principle, I totally agree with John’s initial conclusion: Any problem around synthetic keys is really a data modeling problem and not a problem with the synthetic key itself.

The short answer to John’s question is Yes and No. Yes, we should love the synthetic key. But, No, we should not stop worrying. We should always be alert and ask ourselves: “Do I want this synthetic key? Is the data model OK?

And so, because of the automated and irrevocable data-modeling process which rules out human meddling, the Synthetic Keys are scaring. But they are simple to understand. And completely credible and convincing.

Dr HIC

Further reading on Qlik data modelling:

Circular References

Fan traps and Chasm traps

34 Comments
Partner
Partner

I'd just like to add that from a Best Practice perspective / lessons from the field - I stick to and recommend the following do's and dont's:

1. DO use synthetic tables and keys whilst prototyping - as long as you understand what they are and how they work (i.e. read this article thoroughly). It speeds up your prototyping process to not have to do the pretty time consuming task of creating bridging tables in your model, which you might well end up throwing away depending on the success of said prototype.

2. DON'T leave synthetic tables and keys in your final release. For all the good reasons stated by hic in the original article above - delegating the responsibility to Qlik to model your data is just poor practice - if you want a bridging table with all composite attributes represented, build it yourself so YOU are in control of the model and have made the design decision yourself that this is the best representation of the relationship between the participant tables. There are alternative modelling approaches, so the synthetic table produced by Qlik is not the only possible way that you can model data with several common attributes - hence you cannot rely on Qlik's automated approach to modelling multi-attribute data to be the best one.

One day I might write a blog about "Dispelling Qlik myths"... synthetic keys being "bad for performance" would definitely feature in such a blog. It's neither good nor bad per-say - it's an automated modelling mechanism for something you could do by hand, nothing more, nothing less.

Regards

Jonas

142 Views
thomas_wang
Contributor

,aaHi

QlikView is quite different from traditional DB. When you decide to use it, you need a change of ideation. It allows you to unearth interesting things without using too many data mining theories. The premise of this function is to combine all small facts into one big fact.

Here's another thing i noticed.  Say you filter your date table.  QV will then go through the fact and eliminate any "LHA's" (the other related dim) that dont have a "facts" for those date.  QV then also goes one step further.  It will filter out the LHA's associated with facts that dont have values in the fact table based on your date query.  Now things seem "great"...at first because you have filters actively filtering other filters.  But guess what?  If you have another fact table that does have facts in the LHA's that were just filtered -> they will go missing!  This is bad and deceiving.


You said it was deceiving. I don't agree with you. Those data were excluded because they do be not related to your previous screening if your two tables are related by correct keys - they do not belong to the time when you selected the dimension. This is the interesting part of QlikView.


Admittedly, it is difficult to use it to complete traditional data query, but it is not impossible. Same with jaimeaguilar‌, you can rename the key column name that you do not want to associate. In fact, this practice is losing QlikView's advantage. You do not need to use it to complete this traditional task, but to use other tools.


My humble opinion.

0 Likes
142 Views
Partner
Partner

I agree, what you get out of Qlik much depends on a properly built data model.
Mistakes in the building of data model (often because the modeller is used to traditional databases) you will get misleading results.
But learn to do it right with the associative engine and Qlik will reveal things often missed in traditional data queries.

Traditional queries you get exactly what you ask for but miss lots of things because you did not expect that kind of data.
With Qlik and a proper data model you much easier can see data outside of the expected.

0 Likes
142 Views
thomas_wang
Contributor

Hi HIC,

However, one aspect, which isn't obvious (but it is not hidden), is NULL value handling: Should records of different tables be linked if one of several linking keys have a NULL value and the other keys have matching values? My answer is: "No. NULL should never link." And this is also how the synthetic keys work. A manually created, concatenated key will usually link the records, though.

I think this question deserves careful study. First, there are two possible forms of null values of synthetic keys: one is that all columns are null values and the other is partial columns are null values. We can think for the time being that the former should not be related, and then focus on the latter.

In fact, QlikView's behavior towards the latter is considered to be related. Like the example I replied you in other discussions before. I think this is a conceptual disagreement. As we all known, traditional relational DB defines that if there is any null value in a parameter of a function, a null value is returned. That is to say, if there is a null value in a column of a composite key, the entire composite key is empty. Then this behavior is contrary to the "unrelated null value".

Additionally, if the combination key is nested, this inconsistent behavior leads to more uncertain result. Imagine if an advanced composite key contains two lower-level composite keys, one of which is empty, and the other is not. Personally, for the time being, I'm trying to prevent any nesting from happening in composite keys, because I've had met some logic problems, you know. In my mind, I guess the problem I have encountered is related to the problem I am talking about.

I want to hear your opinion.

Thanks!

0 Likes
142 Views