Re: Join Efficiency and Performance - Qlik Community

Anonymous · ‎2013-11-01

Hi

If in a load script have 2 tables one with a few million rows and one with only a few dozen rows. which is more efficient:

a) Load the big one first and then join load the small one second

b) Load the small one first and then join load the big one second

[I know the ApplyMap() function could be an option, but in this scenario I do not wish to use it]

Best Regards, Bill

Clever_Anjos · ‎2013-11-01

T1:
LOAD
SubField('$(MonthNames)',';',mod(RecNo(),12)) as F1,
rand() as F2
AutoGenerate 10000000;
join
LOAD
  SubField('$(MonthNames)',';',mod(RecNo(),12)) as F1,
  rand() as F3
autogenerate 100;

took 28 secs into my notebook and the code below took the same 28 secs

T2:
LOAD
  SubField('$(MonthNames)',';',mod(RecNo(),12)) as F1,
  rand() as F3
autogenerate 100;
join
LOAD
SubField('$(MonthNames)',';',mod(RecNo(),12)) as F1,
rand() as F2
AutoGenerate 10000000;

marcus_malinow · ‎2013-11-01

I'd be interested to know how the table widths affected this.

So, for example if the larger table had say 10 columns, and the smaller table had 20 columns (and vice versa).

Also, whether this is affected by the join being one to many

Unfortunately too busy to test this myself today 😞

Gysbert_Wassenaar · ‎2013-11-01

The calculations to generate the records may outweigh the time needed for the join. So your example may not be the right way to test this.

One scenario in which it would be faster to load the big table first is if the big table can be loaded with an optimised load from qvd. A join would cause a non-optimised load. The big table would suffer from this more than the small table.

talk is cheap, supply exceeds demand

Anonymous · ‎2013-11-01

Gysbert

Good point re optimised / non-optimised loads, I had not thought of that scenario.

Best Regards, Bill