Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

creating link tables

I've heard about Link Tables. They are useful when joining two or more fact tables with dimesion tables. But I only saw a powerpoint talking about this technique. Now I wanna know how to create it. Any tip ?

1 Solution

Accepted Solutions
Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

In a nutshell, you use LinkTables when you have 2 or more Fact Tables, and they share more than one key - for example, you have Orders and Shipments in separate tables, and they share fields Item, Customer and Date. Your Goal is to link the two fact tables to each other, and to link them both to the Item Master, Customer Master and a Calendar.

You need to create a combined key that will include all existing combinations of Items, Customer and Dates, and load them from both tables:

RawLinkTable:

load distinct

autonumber(Item & '|' & Customer & '|' & Date) as myKey,

Item,

Customer,

Date

resident

Table1

;

After loading those combinations from all Fact Tables, you need to reload your LinkTable once again, using "NOCONCATENATE" and "DISTINCT", to ensure that each combination exists only once. Finally, drop the individual keys (Item, Customer and Date) from all Fact Tables, so that the only place they exist is the LinkTable.

Notice, that for large data sets, LinkTable method is causing generation of a huge unique field, which might significantly impact your performance. For really large data sets I would recommend using other, more advanced, data models, based on concatenating all fact tables into a single one and taking care of the dimension linking, but that would be a completely different story...

good luck!

Oleg

View solution in original post

14 Replies
rothtd
Creator III
Creator III

I also recently learned how to use link tables, and am very impressed with them. Here's my first implementation using them.

My dimension tables are on the left, the link in the middle, and the fact tables (heavily denormalized) are on the right.

My dimension tables are on the left, the link in the middle, and the fact tables (heavily denormalized) are on the right.

I am currently testing this design, so don't consider this my final version. So, far this seems to be working well for me. I think someone else can give you a sample app, but I figured this might help get you going.



Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

In a nutshell, you use LinkTables when you have 2 or more Fact Tables, and they share more than one key - for example, you have Orders and Shipments in separate tables, and they share fields Item, Customer and Date. Your Goal is to link the two fact tables to each other, and to link them both to the Item Master, Customer Master and a Calendar.

You need to create a combined key that will include all existing combinations of Items, Customer and Dates, and load them from both tables:

RawLinkTable:

load distinct

autonumber(Item & '|' & Customer & '|' & Date) as myKey,

Item,

Customer,

Date

resident

Table1

;

After loading those combinations from all Fact Tables, you need to reload your LinkTable once again, using "NOCONCATENATE" and "DISTINCT", to ensure that each combination exists only once. Finally, drop the individual keys (Item, Customer and Date) from all Fact Tables, so that the only place they exist is the LinkTable.

Notice, that for large data sets, LinkTable method is causing generation of a huge unique field, which might significantly impact your performance. For really large data sets I would recommend using other, more advanced, data models, based on concatenating all fact tables into a single one and taking care of the dimension linking, but that would be a completely different story...

good luck!

Oleg

Anonymous
Not applicable
Author

Completely agree with Olegs comment on large datasets. Further, be very careful when using Link-tables in combination with the commonly used "report-flags" (if you don't know what I'm talking about you probably don't use them) and if you use if-statements that span through the link table. Some very tricky effects may occur since the tecnique in effect creates a lot of "many-to-many" relationships.

Personally I try to stay away from link-tables as much as possible due to the complexity for other developers and eventual professional users but I agree it can be extremely powerful used in the right circumstances.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP


Oleg Troyansky wrote:
Notice, that for large data sets, LinkTable method is causing generation of a huge unique field, ... <div></div>


Oleg,

Nice explanation of link tables. In your example you are using autonumber() to generate the key, which is a good idea (or autonumberhash128 which has a slightly easier syntax but more processing). I question your statement about the huge field. My experience of autonumber is that it is extremely storage efficient in that you get a sequence of consecutive integers starting with zero.

Am I misunderstanding?

-Rob

Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Rob,

your understanding of autonumber is absolutely accurate, - it is very efficient and that's what it's being used. My comment relates to the fact that the LinkTable approach requires generating a unique Key with far more instances than the original data fields. I'll continue the same example - suppose your combined key includes the Customer, the Item and the Date. Let's suppose that you have 10,000 Items sold to 1,000 customers and that you load 5 years of data. Theoretically speaking (if we imagine that every Item is sold to every Customer every day), the new key will have :

10,000 * 1,000 * 365 * 5 = 182,500,000 instances. From 10,000 Items as the "longest" key, we've moved to 182 M "combined keys" as the "longest" key. Even with autonumber, it affects memory consumption.

Does it make sense?

Oleg

philip_doyne
Partner - Creator II
Partner - Creator II

Dear Oleg,

I too use this method for complicated multiple fact table situations and have experienced the link table getting very big. Having looked at it in some detail I now am of the view that the link table need be no longer than the number of records in all the fact tables as you only need one link table record for each fact table record - you do not need link table records for the dimension tables - unless of course you feel the need to link the dimensions directly and not through the fact tables.

I therefore only generate link table records from my fact tables.

Philip

Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Philip,

yes, of course, we always generate link table records from the fact tables. In my previous example, it was just a theoretic premise that every customer is buying every item every day. We never generate the keys by linking everything by everything.

However, knowing that QlikView store fields and not table records, the number of rows in tables is less important than the number of distinct values in individual fields. Generating a single field with a long list of distinct values is therefore more "expensive" than having a table with the same number of rows of non-distinct values.

Hope it all makes sense...

Oleg

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Oleg,

Thanks for the clarification. Yes, it makes sense. The additional memory required would be for the State storage for the Key.

Thanks,

Rob

Not applicable
Author

Hi guys,

Just looking at this post, which I know is a long time ago, but am trying to do something similar. The only part which I am unsure of is the Resident part? Which table would be used ie resident...( )?

Thanks