Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
hic
Former Employee
Former Employee

In a well visited post on the community forum, John Witherspoon some time ago asked “Should We Stop Worrying and Love the Synthetic Key?” John’s post begins: “Synthetic keys have a bad reputation. The consensus seems to be that they cause performance and memory problems, and should usually or even always be removed. I believe that the consensus is wrong.” Here’s my view on this topic.

The creation of synthetic keys is simply QlikView’s way of managing composite keys. There is nothing strange or magic around it.

A single key is easy to manage: Just list all unique values in a symbol table (see Symbol Tables and Bit-Stuffed Pointers), and then link the data tables using a natural join.

Source table view.png

But a composite key is slightly different – there is no single symbol table that contains the relevant combinations of the multiple key fields. So QlikView needs to create such a table for all combinations: the $Syn table. In a way, you can say that the $Syn table is a symbol table for composite keys. In the data tables, the multiple keys are replaced by an identifier that uniquely identifies the combination of the values of the original keys: the $Syn key.

Hence, if you have the same set of multiple keys in two or more tables, the QlikView synthetic keys create a general, correct, compact and efficient solution. Synthetic keys do not per se cause performance and memory problems. They do not use a lot more memory than if you autonumber your own concatenated key. And they treat NULLs correctly, as opposed to an explicit concatenated key.

Hence: The synthetic key is in itself good and we should all love it.

Internal table view.png

However…  still, also I avoid synthetic keys. Why?

  1. A synthetic key is in my experience often a sign of a poorly designed data model. I say that, given the number of times I have found a synthetic key in the table viewer only to realize that I made a mistake in the script. If you get a synthetic key and didn’t expect it, I can only say: Back to the drawing board! You should most likely change your data model.
  2. QlikView creates an additional table (the $Syn table) that in many cases is superfluous: An additional table is the best solution if none of the data tables by itself completely spans the set of composite keys. But in real life, there is usually one table that contains all relevant combinations of the keys, and then this table can be used to store the clear text of the individual keys.
  3. For clarity, I like to create my own concatenated keys. It forces me to think and create a data model that I believe in. Removing the synthetic keys becomes a method to ensure a good data model, rather than a goal in itself.

But in principle, I totally agree with John’s initial conclusion: Any problem around synthetic keys is really a data modeling problem and not a problem with the synthetic key itself.

The short answer to John’s question is Yes and No. Yes, we should love the synthetic key. But, No, we should not stop worrying. We should always be alert and ask ourselves: “Do I want this synthetic key? Is the data model OK?

And so, because of the automated and irrevocable data-modeling process which rules out human meddling, the Synthetic Keys are scaring. But they are simple to understand. And completely credible and convincing.

Dr HIC

Further reading on Qlik data modelling:

Circular References

Fan traps and Chasm traps

36 Comments
Anonymous
Not applicable

I will argue that QV is very limited in its use with an dimensionally designed enterprise DW without synthetic keys.  QV is a BI tool and the most BI solutions go against a DW (forget the QV claims that you dont need one - this is rarely the case for real "enterprises"). Most DW's are dimenionally designed with conformed dimensions being the integration points for the various fact tables (things you are measuring).

Consider the (very very simple) example below.  There are two dimensions (date and LHA) that are conformed (linked/related) to two fact tables. Each dimension has a primary key.  That key is a foreign key in each fact table.  The primary key of a fact table is the composite of all the dimensional foreign keys within the table. Now a regular relational and multidimensional model implement this as a series of relationships and use stitch queries when fact grain varies ect...  There are no requirements for a composite key.  For QV tho, it needs one key so it creates a $Syn table that has a composite key of all foreign keys in the fact.  What a pain to create this in your ETL just for QV.  I am glad they have a structure that handles it.  However, it looks goofy and feels like a hack but you need it unless you want to build something "just for QV".

Here's another thing i noticed.  Say you filter your date table.  QV will then go through the fact and eliminate any "LHA's" (the other related dim) that dont have a "facts" for those date.  QV then also goes one step further.  It will filter out the LHA's associated with facts that dont have values in the fact table based on your date query.  Now things seem "great"...at first because you have filters actively filtering other filters.  But guess what?  If you have another fact table that does have facts in the LHA's that were just filtered -> they will go missing!  This is bad and deceiving.

In a sentence, If you filter on a conformed dimension, you may eliminate facts from one or more related fact table even though they exist.  I love QV for simple data marts.   It seems to fall apart on enterprise DW's.

QV_DM.jpg

0 Likes
6,943 Views
Jason_Michaelides
Luminary Alumni
Luminary Alumni

Tim Webber wrote:

  I love QV for simple data marts.   It seems to fall apart on enterprise DW's.

Tim,

You appear to be forgetting the extremely powerful scripting capabilities within QlikView. Whereas everything you say is technically true "out of the box", it is extremely rare that you cannot reshape data as desired within the script to overcome the "limitations", as you put it (although I'd argue for design features that make QV unique rather than limitations that differentiate it from other tools).  Yes, this script will generally ultimately be creating a simple(ish) data mart, but it is almost always faster to do this in QV than it is in the DWH/Mart environment.  I'd almost always rather see a mart and a more simple QV script if possible, but:

  • you may not have a DWH/Mart infrastructure
  • it is often a good tactical solution to use the QV script to define what the mart should eventually do - gets you to market quicker and with less resource, plus drives the requirements through actual use-case rather than guesswork

So, QV doesn't "fall apart" on enterprise DWHs - you just have to use more of the provided functionality to leverage the power of the tool.

Jason

0 Likes
6,943 Views
Anonymous
Not applicable

Thanks for your comments Jason.  I'll think about them.  The main point of my post was in relationship to the requirement to either create new keys to satisfy a modelling constraint in QV or accept $Syn tables.  There is no other way around it.  You accept QV $Syn keys or you create them yourself.  Either way, its more work and I suspect quite a bit of work in key management for large DW's. 

With regards to filters filtering other filters (like cascading filters), the work around is to limit each QV application to one fact table.  Otherwise, the issue I describe will occur and in my mind, its incorrect.  Just because I dont have steering wheel produced on any given "production date" doesn't mean that my application should filter out that production date related to Cars produced on that filtered date.  It doesn't make sense.  I suspect the heart of the issue is that QV should only remove a dimensional member when "all" facts do not have a relationship to that member.  

If you only have one fact table, you dont have a problem.  However, if you only have one fact table, you have a data mart application that deals with one business process in the organization.  One business process is not enterprise by definition.  Dont get me wrong, I love QV for data mart specific applications.

0 Likes
6,937 Views
rbecher
MVP
MVP

Tim,

a good practice to bridge the EDW gap is to chain several QWVs, each containing a single data mart and the conformed dimensions. So, a user can hope with the selection from one mart to the other.

- Ralf

0 Likes
6,937 Views
Anonymous
Not applicable

Thanks Ralf - I did exactly as you mentioned in a few applications.  It definately helps but you cannot use the  measures together on one report object or use them together in a calculation.   So then this "EDW gap" is a known issue?  I have not found much about it in my travels through the internet.  At a minimum you would think that Gartner would have something to say about this...

0 Likes
6,937 Views
jaimeaguilar
Partner - Specialist II
Partner - Specialist II

Hi,

I think you're missing something on how QlikView works. When dealing with multiple fact tables/business areas, the most recommended approach I think is the use of concatenated tables. This way you can concatenate as many fact tables as you need. Also you can identify every fact with text flags with something like 'Sales' as _FactType so you can distinguish every record to its corresponding Fact in the front-end. You can even use the system variable SET HidePrefix in order to avoid end user seeing these "developer fields".

This way you'll have a unique fact table, that will include all the facts that you need for your QlikView app as well as all the surrounding dimensions (This is the way QlikView implements the star schema). Of course not every dimension has to be related to every fact.

Also you can unify date field by naming the main date field in every fact with the same name. Besides you can create a master calendar (or more than one if you need different calendars).  More over if you combine this approach with QVW chaining and using different levels of QVDs/QVWs then you have an outstanding set of possibilities.

Another thing is a misconception within QlikView is that claim that you DON'T need a DW in order to use QlikView. I think that the answer to this is YES and NO. Why? because if you don't even have a DB and work just with .txt, xls or maybe 1 or 2 access db files then you can still work with QV and create DW-like structure combining different data tiers with different QVWs and QVDs. However as your data analysis needs increase, you'll eventually need to move to a more robust structure.

My point is that if you don't have a DW then you can create a similar data structure with QV, but if you already have a DW,  take advantage of it and leave all the heavy transformation, creation of custom views, etc to the DW and focus QV in visualization and analysis (while QV can also be in charge of some transformations, like creation of calculated fields that do not need to exist in original DB). The better the quality of the input data the better analysis you'll be able to do with QlikView or any other BI solution.

I don't think the argue that QlikView is not enterprise ready is a valid argument, I've seen QlikView running in really big companies and it does really well, but what makes the difference is how you model your data.

6,937 Views
Anonymous
Not applicable

@ hic Just have one query. Is there any way to see what synthetic actually hold.

As you are saying it  Theydo not use a lot more memory than if you autonumber your own concatenated key.

So can it also store sequence key like autonumber or some other integer value?

Thanks & Regards,

Kiran Kokade

0 Likes
6,937 Views
rbecher
MVP
MVP

I think having a whole EDW stored into one concatenated fact table is very questionable..

6,937 Views
Brice-SACCUCCI
Employee
Employee

Hi Tim Webber

what I'm reading here seems to be a modelling issue.

If you don't want to link the production dates of the Cars and the Wheels, then simply don't include that date in the linking key.

You probably should have different calendars for parts and for cars, and a master calendar to use when you want to link everything by date.

Henric Cronström explained the principles of this approach in the following blog post:

http://community.qlik.com/blogs/qlikviewdesignblog/2012/10/16/the-master-calendar

We have a lot of customers that use several fact tables to cover several business processes of their company and I already made several applications based on datawarehouses. But it is true that it may take a lot of thinking to get the "right" model, i.e. the model that answers intuitively to the end user's questions (your example is a very good illustration of this).

Regards,

Brice

0 Likes
6,903 Views
Anonymous
Not applicable

Bad sample of Edit Script.

Avoid use it.

0 Likes
6,903 Views