Solved: Left join (big tables) - Qlik Community

jasmina_karcic · ‎2020-02-05

Hi experts,

I am loading one table (cca 600 000 000 rows, 20 columns)... It is going pretty quickly. But, I need a new column add to this, from another table. Another table have like 5 000 000 rows.

It is very slowly if I use left join, what can make it faster?

Thank you

Jasmina

marcus_sommer · ‎2020-02-05

Personally I would use mapping like the others already suggested. It avoids all the risks which could occur with joins (adding or removing records depending on the join-type) and it will be surely faster as a join. Further it provides the possibility to define any default-value if there is no match and also to do further calculations at once - and not like by join which would require another load for any further checks/calculations. Also I would look if the mapping-table might not be reduced to a lesser number of records with some kind of advanced filtering. This may look like:

TableForExists: load article from ArticleRawdata where category = 'very important';

/* it's just an example - it may also some generic approach like the recent one from yours (this with the concatenating) with creating the dates from the current year */

ExtractMappingData: load lookup, return from SecondTable where exists(article);

Map: mapping load * resident ExtractMappingData;

drop tables ExtractMappingData, TableForExists;

It may look like expensive efforts but nothing worked better to optimize the load-performance by large datasets - unless you used incremental approaches. In your case with such large tables I would try to implement incremental loadings at first and if it's then further necessary to reduce the load-times the above mentioned approach could be also implemented on top.

- Marcus

View solution in original post

marcus_sommer · ‎2020-02-06

I think you could do the second step within the first one, like:

if(column2=0, applymap(), column2)

- Marcus

View solution in original post

Anil_Babu_Samineni · ‎2020-02-05

Can't you add directly into the fact table for new column also?

Best Anil, When applicable please mark the correct/appropriate replies as "solution" (you can mark up to 3 "solutions". Please LIKE threads if the provided solution is helpful

jasmina_karcic · ‎2020-02-05

In this moment no.

What is fact, I have something like this:

In big table ( more than half billion rows) i have something like this

column1 column2

1 0

2 0

3 0

4 40

5 60

In other table i have

column1 column2

1 30

2 20

3 20

This column2 from second table is from big importance to me, I need that data...But how...

I was thinking like:

table1:

load * from table1.

left join (table1)

load column1,

column2 as different_column

from table2;

From this I will get a new column, but I would have data. I need it in this table1, because this table1 is connected to other data that I need.

Jasmina

Mauritz_SA · ‎2020-02-05

Hi @jasmina_karcic

Have you looked at using ApplyMap? Here is an article by Henric Cronström explaining why it is better than joining if you just want to get a single value from a table.

Hope this helps.

Regards,

Mauritz

Anil_Babu_Samineni · ‎2020-02-05

Since there was same column, Simply do concatenate on both

Load column1, column2 From T1:

Concatenate

Load column1, column2 From T2:

Or, you can define using ApplyMap concept

Best Anil, When applicable please mark the correct/appropriate replies as "solution" (you can mark up to 3 "solutions". Please LIKE threads if the provided solution is helpful

jasmina_karcic · ‎2020-02-05

If I use concatenate and get this:

column1 column2 column3 column4

1 0 A B

2 0 A B

3 0 A B

1 20 null null

2 30 null null

3 20 null null

then how can i make table:

column1 column2 column3 column4

1 20 A B

2 30 A B

3 20 A B

Jasmina

p.s. now I will look applying map

petter · ‎2020-02-05

You will get the requested table iIf column1 is a unique key then having:

1) column1 as a dimension

2) column2 as a measure with the expression Sum(column2)

3) column3 as a measure with the expression Concat(column3) or Maxstring(column3) or Minstring(column3)

4) Same as for column3 but with column4 instead.

Mauritz_SA · ‎2020-02-05

Hi @jasmina_karcic

Below should explain the ApplyMap principle.

Table2:
Mapping
LOAD * INLINE [
column1,column2
1,30
2,20
3,20
];

Table1:
LOAD * INLINE [
column1,column2
1,0
2,0
3,0
];

Rename Table Table1 to Temp;
NoConcatenate
Table1:
LOAD
column1,
ApplyMap('Table2', column1, Null()) as column2
Resident Temp;

Drop Table Temp;

In your case you won't have to do the rename and drop as you'll already have the table. You can obviously use logic to only use ApplyMap if the original table didn't have a value, or if it was 0, etc.

Regards,

Mauritz

jasmina_karcic · ‎2020-02-05

columns that i present are not measures, just dimensions....so i cannot use sum

petter · ‎2020-02-05

Then you can have a preceding LOAD with a GROUP BY clause to contract the table into what you want. It doesnt matter whether the fields are dimensions or not in the load script. What you want to do is get rid of the null values and merge the rows that belong together....

But you might have to do a RESIDENT on the big concatenated table so I don't know if the performance or memory consumption will be good... ApplyMap might work better for you here. You'll have to give it a try.

Left join (big tables)

SaaS