Solved: Synthetic keys >> sometimes a good idea? - Page 2 - Qlik Community

Report Inappropriate Content · ‎2012-07-20

Hi Qlikview experts!

I'm hired by a company that uses qvw's of approx 3 gigabytes...

We have been thinking about optimizing reload-time (it is now about 35 minutes).

My colleague told me that it could be a wise idea to not always resolve synthetic keys, especially when it is about large QVW's.

So my questions are the following:

1: Is it true that (performance-wise) synthetic keys sometimes kan better be kept in the model? If so, why?

2: when are synthetic keys "wrong" (meaning that the data integrity is not guaranteed anymore)

I'm curious and couln't find much details for it (and I didn't do benchmarks yet). Hope you have inputs and perhaps benchmarks!

Roberto

johnw · ‎2012-07-23

Roberto Postma wrote:
From what I have read so far, the load-time-improvement does not seem to be very significant...

In the real world, that seems likely to be correct.

Roberto Postma wrote:
My colleague heard of a qvw that was reloaded much faster with synthetic keys. I'll ask her for details (I thought an improvement from 60 minutes to 30 minutes or something like that).

I can set up examples using random data with even more dramatic differences, but I don't consider those examples to be representative of what happens in the real world. I'd bet that a typical application with composite keys spends very little of its time building those composite keys. Similarly, I'd bet that a typical application with synthetic keys spends very little of its time building those synthetic keys (like my 10 seconds out of 8 minutes example). I'm not sure how replacing composite keys with synthetic keys would knock load times down from 60 minutes to 30 minutes. The program would have needed to be spending more than half its time building composite keys. That seems very unusual. I'd love to see the details, though.

johnw · ‎2012-07-23

While not specifically about synthetic keys, you may want to look at the following threads about speeding up large applications (including load times).

http://community.qlik.com/message/97743
http://community.qlik.com/message/111451

Oleg_Troyansky · ‎2012-07-25

John,

you always compare a synthetic key to a composite key as the only alternative... When I say something like "a well thought through and a well constructed data model", I don't necessarily refer to having a composite key or a link table.

Specifically for large data models, I'd actually prefer to concatenate tables with multiple common fields, rather than to link them with a composite (or synthetic) key.

My preference is built on the understanding that most of the memory is being spent on fields with the highest number of distinct values - those fields consume a lot of space, both for the storage of the list, and for bigger indexes. Therefore, our job as developers is to avoid generating key fields with a lot of distinct values.

In your example, Product and Date are two common fields between two tables. In a large database, it's fair to assume that you might have thousands of Products and over a thousand of distinct dates. However, creating a composite key (or a synthetic key, for that matter), can result in generating a field with millions of distinct values, in the extreme case (thousands of products multiplied by over a thousand of dates). A single field with millions of distinct values will consume MUCH more memory than two fields with a few thousands of distinct values.

In other examples, people create composite (or synthetic) keys that combine Products, Customers, Dates and more, effectively generating monster keys that end up multiplying the memory consumption with no added value.

So, in my mind, cleaning up your data model is not just replacing synthetic keys with composite keys. It's more about optimizing your data model for performance - again, I apply those requirements primarily to large data models, because small datasets can allow all sorts of sloppiness.

cheers,

Oleg

Ask me about Qlik Sense Expert Class!

johnw · ‎2012-07-25

Oleg, I agree with everything you just said, I apologize for how I tend to focus on only a single alternative, and I'll try to do better when discussing synthetic keys in the future. I do think synthetic keys are typically better than composite keys. But I also agree that there are often better options than either of those.

Jason_Michaelides · ‎2012-07-31

Cheers Oleg - concatenation would definitely be my preference in the example above. I would always strive to have a single fact table surrounded by dimensions so the composite keys I refer to above are almost always to link these two different types of table. It is rare that the dimension won't contain all the required values (e.g. Product) whereas the fact table will obviously only have the (e.g.) products actually sold.

John, as to your comment "We are perhaps merely disagreeing on what is simpler and clearer." - I guess you're right. I believe that composite keys allow a better internal table view (necessary to understand if your data model is optimised for performance) and by the very fact you have to put in a line of code to build the key you can ensure your reasons for doing so are well commented in the script for others to understand. Synthetic keys can lead to a bit of a guessing game as to why they are there - intentional, or bad/lazy coding?

shantanu73 · ‎2013-01-31

Hello Friends/Forums,

I dont know why there should be synthetic key. As somehow sometime it is good but max it is not.

If we see/speak about data modeling all the entity is link with other entity/table by one or two or three column/fields. We can combin all those column and with the help Autonumber() function we can have one column for link to other tables And in that table also we can apply Autonumber() function. If we do all this thing proper then Synthetic table/column will not come in picture at all.

If I am wrong pls correct me.

Regards

Shantanu

Report Inappropriate Content · ‎2013-01-31

It is not always about what is wrong or right. There are multiple visions on what is the best practice here, and the outcome depends on what you find most important

- is this the reload-time

- is this that the datamodel should be readable easily (which is a bit more difficult with a synth key)

So not really one answer to choose.

To react on that autonumber suggestion: I have seen cases where autonumber() caused an increased loadtime (...) so this also is not the magic solution to anything.

johnw · ‎2013-01-31

My current personal opinion: Restating a lot of what's in this thread and the one I linked to, if your tables are properly linked by two fields, a synthetic key is one valid way to model this link. It is not in and of itself a data model error. It may or may not load or perform better than an equivalent composite key, depending on the situation. However, since a synthetic key is usually a symptom of a bad data model, you would want to document it as completely as possible, or the next person that sees your code will probably assume you made a mistake. A composite key, on the other hand, is self documenting. And often neither is the best approach, and the best approach is to use a different data model completely, one that doesn't require either sort of key.

My current personal advice: If you're a new developer, figure out WHY the synthetic key appeared, and then remove it. At worst, it's good data-modeling practice. If you're an experienced developer, and you're not SURE that a synthetic key is appropriate, then it isn't - remove it. If you're an experienced developer, and you're in a rush, and you KNOW that a synthetic key will work, go ahead, but remove it later. If you're an experienced developer, and you KNOW you have a good reason for having a synthetic key (such as if in this particular case it significantly improves load performance, and load performance is critical for the application), and you've actually taken the trouble to compare the alternatives, go ahead, but document it thoroughly, because otherwise that new developer is going to take my advice and remove it and undo all your hard work.

Jason_Michaelides · ‎2013-01-31

I think that says it all John - a perfect summary

shantanu73 · ‎2013-02-01

Dear Friends,

It was just my view. My aim is always to get better performance related to data updation. Partically if we see only 1 field/column is enough to link between the different entity which is ideally good modeling. I also agree with all of you that sometimes Synthetic key is good for better performance. Anyhow, we have to see the scenario, check the performance which is good. And accordingly we have to take decision.

However, Thanks for advice.

Regards

Shantanu