To Join or not to Join - Page 3 - Qlik Community

hic · ‎2012-09-12

The QlikView internal logic enables a data model with several associated tables. It not only allows – it encourages you to use several tables when building a data model.

This is very different from many other BI or query tools where, when several tables are used, they are all are joined together into one table. The most obvious example of this difference is a simple SELECT statement. With it, you can use several tables as input and join them, but the output is always one single, denormalized table.

With QlikView, in contrast, you can have a multi-table relational data model that is evaluated in real-time. The associations are evaluated as joins at the moment when the user makes a selection in the application. At the same time, all objects, some with complex calculations based on these joins, are recalculated.

When creating the QlikView data model, you have a choice of loading the tables as several entities or joining some of them together. Joining in the script means that the result of the join is stored in the QlikView data model as one single table.

So what should you do? Is it better to keep the data model normalized (many tables) or is it better to de-normalize (fewer tables)?

My view is that it usually is better to keep the data model as normalized as possible. A normalized model has many advantages:

It is memory efficient.
It is, by definition, the data model that uses least memory.
It is CPU efficient.
In most cases, QlikView calculations in a normalized model are as efficient - or only marginally slower - as in a denormalized model. In some cases the normalized model is faster.
It is easier to understand and manage.
It should be possible for other developers to read your script: A simple script with as few transformations as possible, is a script that is easy for other developers to understand and maintain.
It minimizes the risk for incorrect calculations.
Joins potentially change the number of records in the tables, which means that a normal Sum() or Count() function cannot always be used – they would sometimes return an incorrect result. You may counter that there is always a way to write a correct formula, but my point is that it should also be easy. Expressions in server objects will be written by users that do not have special knowledge about the data model in the app.

But it is not a clear-cut case.

Often there is a trade-off between memory efficiency and CPU efficiency. In other words, there are cases where you can decrease response time by letting the data model use more memory; where performance will be better if you make the join in the script.

One such case is if you have two very large fact tables, like Order Headers and Order Details. An other is if you have chart expressions containing fields from different tables. Then QlikView has to perform the join in memory generating a virtual table over which the summation will be made. This can be both memory and CPU demanding, so you might get a better performance if you have made the join already in the script. But the difference is sometimes only marginal. You need to test, to be sure.

Bottom line is that you’ll have to weigh pros and cons. Don’t join unless you have to. If performance is important and you experience a noticeable improvement when you join, then you probably should join. But ask yourself what the implications are. Is the script still manageable? Can a user understand how the formula should be written?

The best join is often the one that never is made. Often – but not always.

HIC

See more about this topic in the Technical Brief about Joins and Lookups

Report Inappropriate Content · ‎2015-08-09

great & useful article ,thanks.@srinuinqlikview

Report Inappropriate Content · ‎2016-05-09

Hi,

Bit confused as to which way to go.

I have a fact table that has rows

which I then concatenate with another fact table around.

Then I left join 9 tables.

110,000 rows bringing through 6 field

48,000 rows x 2 7 fields

4500 rows x2 6 fields

19000 x2 6 fields

10 rows 3 fields

25 rows 3 fields

Sales:

So Load * from fact1 where date>value

Concatenate

Load * from fact2 where date>value

Left join (Sales)

Load * from table1

What would be the best way to optimise this script/load?

Apply maps, joins within the load or anything else?

hic · ‎2016-05-11

I would not join the tables. In most cases you don't need the join. Further the script is more manageable if you don't join the tables. So start with the concatenation of your fact tables, then proceed to loading the dimensional tables - but without the joins.

HIC

Report Inappropriate Content · ‎2016-05-11

Hi,

Thanks for getting back to me.

At the end of the script I do some post load calculations on SalesTransactions. SalesTransactions would be the table that has the fact and dimension joined via left joins then at the end I use this to create fields from the dimensions:

How would I do this without joining please? Is this possible?

SalesTransactions:

LOAD *,

TopDepartureName &'-'&TopArrivalName AS Flow,

If (TopDepartureName < TopArrivalName,

TopDepartureName & '-' & TopArrivalName,

TopArrivalName & '-' & TopDepartureName) AS FlowGrouped,

If (DepartureName < ArrivalName,

DepartureName & '-' & ArrivalName,

ArrivalName & '-' & DepartureName) AS JourneyGrouped,

DepartureName & '-' & ArrivalName AS Journey,

If (LegFare = 0, 1, 0) AS ZeroFlag,

If(SaleType=0,Company, ProductCompany) as CompanyOverall

Resident Sales;

DROP TABLE Sales

hic · ‎2016-05-12

I would make these calculations directly in the fact table. I assume that you make the join to "move" fields from the dimensions into the fact table and then use these in the calculations. However, this you can do also with ApplyMap(). See Don't join - use Applymap instead.

HIC

Report Inappropriate Content · ‎2016-05-16

I used apply map for these post load calculations worked great, however my load is still over 2 hours long. I need to cut it down to at least an hour.

a2dr37ao · ‎2021-06-25

Hi Henric, in this post I understand that "associations are evaluated as joins at the moment when the user makes a selection in the application. "
So, I can say that Joins only occurs when we scripting our code? All the rest is commanded by Qlik Associations?

a2dr37ao · ‎2021-06-25

Hi @hic , in this post I understand that "associations are evaluated as joins at the moment when the user makes a selection in the application. "
So, I can say that Joins only occurs when we scripting our code? All the rest is commanded by Qlik Associations?

I ask you that because Qlik´s help say: "The tables defined in the Qlik Sense script are called logical tables. Qlik Sense makes associations between the tables based on the field names, and performs the joins when a selection is made, for example selecting a field value in a filter pane. This means that an association is almost the same thing as a join. The only difference is that the join is performed when the script is executed - the logical table is usually the result of the join. The association is made after the logical table is created - associations are always made between the logical tables."

https://help.qlik.com/en-US/sense/May2021/Subsystems/Hub/Content/Sense_Hub/Scripting/associations-be...

Could you explain more about when joins really occurs?

Join means scripting and creation tables and Associations means run time?

JordyWegman · ‎2021-06-25

Hi @a2dr37ao,

An actual join will only be done in the script if you tell it to join, check the script below.

Table:
Load
    A,
    B,
    C
From [Source.qvd] (qvd)
;

Left Join ( Table )
Load
    A,
    X,
    Y,
    Z
From [Alphabet.qvd] (qvd)
;

This will then result in one table. The table itself is again build out of multiple symbol table (check this nice QIX Engine presentation from @hic for more information). That is the reason why more data in a data model will not linearly decrease the performance.

If you keep the two tables apart from each other, there is no join done in the same way that our script does it. Though, the associations are checked between the tables, just as the Help tells us. If you create an object (say a line chart), we put B as a dimension and Sum( Z ) as a measure. When executed, the QIX engine will calculate, based on the associations, the outcome for each dimensional value.

Hope this makes it a bit more clear.

Jordy

Climber

a2dr37ao · ‎2021-06-25

Hi @JordyWegman Jordy, thanks for answer my friend. So, I can tell that: Join occurs on scripting Association on runtime, on dashboards like consequence of filters and selections, right?

And thanks for presentation!