Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Solving the Informatica Dilemma: On-Demand Briefing - Watch On Demand!
cancel
Showing results for 
Search instead for 
Did you mean: 
Qlik_William
Contributor III
Contributor III

Concatenate vs link table app size

Hello!

 I wanted to try to see the difference in app size between using the concatenate method for 3 fact tables with few fields in common and the link table approach. In my example I load 20M rows of each fact table with only 3 common dimensions of 11 total between the tables.

So I made some test data, I'm not sure what conclusions I can make from this, if any, which is why I made this post. Maybe my example does not make sense. But in my example the Link table approach has an app size of  159MB and the concatenate approach an app size of 184MB. 

So the concatenate approach created a larger app which goes against the general idea that the link table approach creates a larger app due to the extra link table. I guess the reason is that I only use 10 distinct values for the Dim1-10 so my link table is small (only 110 rows), maybe this is not realistic.

But I'm still curious why the Concatenate approach created a larger app? Is it because of the null values? I thought the null values had insignificant impact on app size. Is is because the table is wider? 

The code:

---CONCATENATE---

Table1:
LOAD
'2023' as year,
Floor(Rand() * 10) + 1 AS Dim1,
Floor(Rand() * 10) + 1 AS Dim2,
Floor(Rand() * 10) + 1 AS Dim3,
Floor(Rand() * 10) + 1 AS Dim4,
Floor(Rand() * 10) + 1 AS Dim5,
Floor(Rand() * 10) + 1 AS Dim6,
Floor(Rand() * 10) + 1 AS Dim7,
Floor(Rand() * 10) + 1 AS Dim8,
Floor(Rand() * 10) + 1 AS Dim9,
Floor(Rand() * 10) + 1 AS Dim10,
Floor(Rand() * 100) + 1 AS Value1,
'Table1' as Source
AutoGenerate(20000000);

Concatenate(Table1)
LOAD
'2023' as year,
Floor(Rand() * 10) + 1 AS Dim1,
Floor(Rand() * 100) + 1 AS Value2,
'Table2' as Source
AutoGenerate(20000000);

Concatenate(Table1)
LOAD
'2023' as year,
Floor(Rand() * 10) + 1 AS Dim1,
Floor(Rand() * 10) + 1 AS Dim2,
Floor(Rand() * 100) + 1 AS Value3,
'Table3' as Source
AutoGenerate(20000000);

---LINK TABLE---

_Table1:
LOAD
'2023' as Year,
Floor(Rand() * 10) + 1 AS Dim1,
Floor(Rand() * 10) + 1 AS Dim2,
Floor(Rand() * 10) + 1 AS Dim3,
Floor(Rand() * 10) + 1 AS Dim4,
Floor(Rand() * 10) + 1 AS Dim5,
Floor(Rand() * 10) + 1 AS Dim6,
Floor(Rand() * 10) + 1 AS Dim7,
Floor(Rand() * 10) + 1 AS Dim8,
Floor(Rand() * 10) + 1 AS Dim9,
Floor(Rand() * 10) + 1 AS Dim10,
Floor(Rand() * 100) + 1 AS Value1
AutoGenerate(20000000);

_Table2:
Noconcatenate LOAD
'2023' as Year,
Floor(Rand() * 10) + 1 AS Dim1,
Floor(Rand() * 100) + 1 AS Value2
AutoGenerate(20000000);

_Table3:
Noconcatenate LOAD
'2023' as Year,
Floor(Rand() * 10) + 1 AS Dim1,
Floor(Rand() * 10) + 1 AS Dim2,
Floor(Rand() * 100) + 1 AS Value3
AutoGenerate(20000000);


LinkTable:
LOAD Distinct
Year & '|' & Dim1 & '|' & Dim2 as Table1CompositeKey,
Year & '|' & Dim1 as Table2CompositeKey,
Year & '|' & Dim1 & '|' & Dim2 as Table3CompositeKey,
Year,
Dim1,
Dim2
Resident _Table1;

Concatenate(LinkTable)
LOAD Distinct
Year & '|' & Dim1 & '|' & null() as Table1CompositeKey,
Year & '|' & Dim1 as Table2CompositeKey,
Year & '|' & Dim1 & '|' & null() as Table3CompositeKey,
Year,
Dim1,
null() as Dim2
Resident _Table2;

Concatenate(LinkTable)
LOAD Distinct
Year & '|' & Dim1 & '|' & Dim2 as Table1CompositeKey,
Year & '|' & Dim1 as Table2CompositeKey,
Year & '|' & Dim1 & '|' & Dim2 as Table3CompositeKey,
Year,
Dim1,
Dim2
Resident _Table3;

Table1:
LOAD
Year & '|' & Dim1 & '|' & Dim2 as Table1CompositeKey,
Dim3,
Dim4,
Dim5,
Dim6,
Dim7,
Dim8,
Dim9,
Dim10,
Value1
Resident _Table1;

Drop table _Table1;

Table2:
LOAD
Year & '|' & Dim1 as Table2CompositeKey,
Value2
Resident _Table2;

Drop table _Table2;

Table3:
LOAD
Year & '|' & Dim1 & '|' & Dim2 as Table3CompositeKey,
Value3
Resident _Table3;

Drop table _Table3;

Link table data model:

Qlik_William_0-1749849387923.png

 

Labels (2)
2 Solutions

Accepted Solutions
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

The size numbers you posted for your data models are the size on disk, which is not meaningful to this discussion. What you are interested is the size in memory, or the "base memory size" in the QMC. 

rwunderlich_0-1750048272829.png

Using the memory size number we can see that the Concatenate model is still larger, but much larger. 

In your example the additional space used for the index space is the reason for the greater size, coupled with the fact that your link table has fewer values than a real world example is likely to have. 

Oleg gave a good overview and I'd also refer you to these posts:
https://qlikviewcookbook.com/2024/11/reducing-qlik-field-memory/
https://community.qlik.com/t5/Design/Symbol-Tables-and-Bit-Stuffed-Pointers/ba-p/1475369

I analyzed your models with QSDA Pro to measure the differences. In the Link Table model the Value1 field requires 17MB of index (row pointer) space. 

rwunderlich_1-1750048824910.png

The Concatenated model, having three times the rows for Value1, requires three times the index space for this same field.

rwunderlich_2-1750048920165.png

I don't think there is a general rule that says a concatenated or a link table model will be larger. It really depends on the data. 

I can think of a couple of rules that will be generally be true:

1. Concatenated vs Link Table approach may present different behavior (results) in the front end when selections are made. 

2. Concatenated models will typically perform faster, although you can get quite acceptable performance from a Link Table. 

3. Sparse (lots of nulls) tables are memory "expensive". Expensive in the sense the null values don't represent any data but they do take up index space.

-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com

View solution in original post

Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi William,

At first, I tested the models without using Autonumber. Then, I applied Autonumber and the size of the composite keys got reduced substantially.

As promised, I posted a blog article on this example with a detailed description of the issues. Happy reading!

https://www.naturalsynergies.com/q-tip-27-link-table-or-concatenated-fact/

Cheers,

Oleg

View solution in original post

18 Replies
rubenmarin

Hi, if it helps as hint: when I use the concatenate method I try to reuse field names to make a narrower table.

So I would have only Value1 field in all tables, and use Source in every measure to filter wich values I want.

You can try this to confirm the impact of null values.

In my case it has reduced from 238Mb to 213 Mb... still the link table size is only 165 Mb, that could be because the nulls on the dimension fields

And yes, a link table with 110 rows from 60 million of rows of data it's not very common.

Vegar
MVP
MVP

It is a very good question @Qlik_William .

In a quiz I would have guessed the opposite outcome. But application file size might not the most important thing, I think RAM consumption is more important. Have you checked the ram footprint? Do you see the same pattern there?

Qlik_William
Contributor III
Contributor III
Author

Good point. That will reduce number of fields and nulls.

However how do you separate the data when you have several measures in each fact table and you want to combine them from same fact table? For instance sales * quantity and you have all measures in the same field?

Then you need some extra dummy dimension to separate measures on? And you need to separate on source and dummy dimension? Or am i missing something? Seems this can be confusing for users. Maybe it is still better though

Qlik_William
Contributor III
Contributor III
Author

Yes true. I will see if i can test the ram footprint, i am using no frontend logic/visualization though so maybe the test will not be so relevant.

If i wanted to test ram footprint how should i do it to make it relevant, do you have any ideas? 

alexquimu
Partner - Contributor III
Partner - Contributor III

Hey @Qlik_William 

 

Regarding the link table model, do not forget to use the autonumber() function for the three composite keys. While the sample data you provided has just around 100 different combinations, in a real scenario there might be millions of different combinations (maybe composite keys are formed by more than 3 fields), so the change could be really noticeable.

 

Greetings,

Alex

Qlik_William
Contributor III
Contributor III
Author

Yes indeed. I always use autonumber on key fields normally but didn't for some reason in the test above.

When I get time to test again I will do that

 

Vegar
MVP
MVP

I put your datamodels into qlikview and ran the two version through the good old DocumentAnalyzer. I also took a look at my actual ram usage when opening them in QlikView desktop, see screenshots below. 

Vegar_0-1749984335112.png

It seems like it is the single big table (pink markings) in the concatenated data model that takes up the most inital ram consumption (rows x fields), ending at 660MB, this is 2.5x the ram consumption needed by the linked model approach  that uses 260MB. I am able to verify these numbers  when I looking at the actual ram usage (the pictures in the bottom of the image). 

However, the two data models seems to scale similar per new users, so it is the initial ram cost that differs between the two.

 

This observation disrupts somewhat my understanding of how the engine works, I was not expecting this. Maybe the author of the DocumentAnalyzer @rwunderlich, or @Oleg_Troyansky know something about the memory usage in Qlik, and are able/willing to explain why we see this this big difference. 

Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Well, in order to fully understand what's happening here, we should have a fundamental conversation "about the birds and the bees", aka "how Qlik stores data". Let me start from shameless self-advertising, and then I'll  get to a brief explanation of the unexpected outcomes.

This September, you guys have two outstanding opportunities to listen to my deep dive into Qlik Data Modeling, Scripting, and Performance. The next session of the Masters Summit for Qlik, in its completely new Cloud Edition, is going to take place in Hamburg, Germany, on September 29. You will learn everything there is to know about the cloud, moving to the cloud, managing the cloud, and also you will hear me talk about "the birds and the bees" - what's good and what's bad in Qlik Data Modeling and Qlik Application Performance.

In addition to that, I will teach my Qlik Expert Class in Vienna, Austria, on September 22nd, where I will be focusing solely on "the birds and the bees" - the fundamentals of advanced Qlik application development - Data Modeling, Scripting, Set Analysis and AGGR, and Performance. No cloud content there though.

Now, let me take a stab at the question at hand.

When you learn and fully understand the way Qlik stores data (the "Symbols" tables with distinct values for each field and the "Index" table with tightly packed pointers to the Symbols), you realize that most of the memory in large data sets is consumed by the Index tables, especially for fields with lots of distinct values.

Once you understand that, you begin to understand the reasons why Link Tables have such a bad reputation - it's not the extra table that requires more memory, but rather the concatenated Key field that typically has high cardinality (e.g. holds a lot of distinct values). For example, if my data contains 1,000 Products, and 10,000 customers, and 1,000 unique Dates, then a concatenated key field that combined all three fields can have up to 10 billion distinct values (if all customers purchased every product every day - wouldn't that be nice?). In reality, you could expect maybe only a few millions of distinct combinations - but even that requires a lot of memory.

Now, in this manufactured example we saw a combination of 3 Dimensions with 10 distinct values each. The full Cartesian join of the three can generate up to 1,000 distinct values. That's not enough distinct values to cause any performance concerns.

Concatenating these 3 fact tables, though, causes other overhead issues that are minimal in real world applications, but become prominent in this manufactured example. If Linked Table scenario holds 3 Index tables with up to 10 fields each and a length of 120 mln rows, the Concatenated table has to have a single 60-million rows table with up to 30 fields. That's a lot of unnecessary waste - the table is both wider and longer. When it's not being countered by the (usually) huge concatenated key in the Link Table, these overheads cause the Concatenated Fact to look bigger than the Link Table alternative.

There are other aspects that we could discuss at lengths here, but these are the most prominent issues. I recommend for anyone who is curious enough, to raise the number of distinct values in all 3 fields up to 1,000 and see what happens. I bet you will see a different picture there. I might do it myself and post the results in my blog.

Cheers,

Oleg Troyansky

Vegar
MVP
MVP

Thanks for responding to my callout @Oleg_Troyansky .

I thought I had a pretty good understanding of the bees and the trees, and based on that I thought that symbol tables somewhat kept Qlik imune to the size creep that happens in other data tools by having a wide and partly empty table. 

The part of large(r) composite keys will of course have an negative impact on ram usage, but if all three tables had the same single primary key then the benefit of having the trees tables linked would have been even more beneficial compared to a concatenated alternative. 

I will reflect a bit on what you wrote and probably try to do some parameter adjustments, as you suggested, to the script. I'll definitively read your blog post if you decide to go forward with this idea.