Qlik Community

New to Qlik Sense

Discussion board where members can get started with Qlik Sense.

eduardo_dimperi
Valued Contributor II

Problem with Left Join

Hi !

I need a little help to understand why i can not still use this code.

My code works, but last month my data had increase and my code pass to process from 2 minutes to 5 hours and consumo a lot of memory, but my data not increase that much. All problem occur in the last left join.

We talk about 5 milions of lines (Table Exclusao) to left join with 140 thousand lines, that i think is a small table to QlikSense

Any ideia why or how i can workaround this?

Exclusao:

LOAD

    OID_METER,

    EXCLUIR

    RESIDENT Exclude_Meter

    WHERE NOT EXISTS(EXCLUIR,OID_METER)=-1

    ORDER BY EXCLUIR;

   

DROP TABLE Exclude_Meter;

   

inner join (Exclusao)

ANALISE_TMP:

LOAD

  ID_LEITURA,

    NAME_SYSTEM,

    NAME_GROUP,

    NEIGHB_SYSTEM,

    CITY_SYSTEM,

    OID_METER,

    NAME_TYPE_METER,

    TYPE_CONSTANT,

    NAME_UC,

    ADDRESS_BLOCK_UC,

    DATA AS DIA0,

    TEMPO,

    VALUE_READ AS Val0,

    IF(OID_METER=PEEK(OID_METER),PEEK(Val0)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA0

  RESIDENT TMP2

    where FLOOR(DATA)=FLOOR(TODAY()) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)

    ORDER BY OID_METER,DATA,TEMPO DESC ;

LEFT JOIN (Exclusao)

LOAD

  OID_METER,

    TYPE_CONSTANT,

    VALUE_READ AS Val1,

    IF(OID_METER=PEEK(OID_METER),PEEK(Val1)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA1

  RESIDENT TMP2

    where FLOOR(DATA)=FLOOR(TODAY()-1) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)

    ORDER BY OID_METER,DATA,TEMPO DESC ;

//LEFT JOIN (Exclusao)

LOAD

  OID_METER,

    TYPE_CONSTANT,

     VALUE_READ AS Val2,

    IF(OID_METER=PEEK(OID_METER),PEEK(Val2)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA2

  RESIDENT TMP2

    where FLOOR(DATA)=FLOOR(TODAY()-2) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)

    ORDER BY OID_METER,DATA,TEMPO DESC ;

1 Solution

Accepted Solutions
MVP
MVP

Re: Problem with Left Join

Hi Eduardo,

the answers to your questions completely depend on your business analytics needs. If you have many transactions for the same Meter and Type, and you don't care about Date/Time, you can:

- concatenate all transactions into one table instead of joining, and that would eliminate duplication

- aggregate all transactions into one number per Meter and Type, using GROUP BY.

Regarding your second question - having 2 tables with 3 common fields would cause a Synthetic Key. Personally, I'm of the opinion that synthetic keys should be avoided. Other experts may tell you otherwise. Ultimately, the structure should be driven by your business needs.

cheers.

Oleg Troyansky

12 Replies
daveamz01
Contributor III

Re: Problem with Left Join

Hi Eduardo,

If you the time to process goes from minutes to hours then for sure there is an issue with the keys by which the tables are joined. You can check by running the code in the debug mode with a limited number of rows (e.g. 50).

Also, it is a good practice to create composite keys when joining tables by multiple fields (e.g OID_METER&'_'&TYPE_CONSTANT AS _OID_METER_TYPE_KEY)

Regards,

David

ahaahaaha
Honored Contributor

Re: Problem with Left Join

Hi Eduardo,

May be the code construction of

Table1:

LOAD

...

Resident....

Left Join

LOAD

....

Resident Table1;

to replace by

Table1:

LOAD

...

Resident....

NoConcatenate

Table2:

LOAD

....

Resident Table1;

DROP Table Table1;

?

Regards,

Andrey

MVP
MVP

Re: Problem with Left Join

Hi Eduardo,

it would be a lot easier to troubleshoot your problem if we could see the actual app. Without it, I can only imagine what might happen and what might cause such a huge difference in processing times.

At first, I suspected that you may have misspelled one of the Key field names, which could cause a Cartesian Join. Based on your script, it doesn't seem to be the case.

Next, I think you might experience severe duplication issues. The last 2 LEFT JOIN sentences contain

OID_METER and TYPE_CONSTANT, so all possible values Val1 will be matched with all possible values of Val2 for the same OID_METER and TYPE_CONSTANT. If you should have many transactions with the same key values, the row counts will be multiplied - 1000 rows with Val1 and 1000 rows with Val2 will produce 1,000,000 rows.

Finally, I spotted a syntax error in your script:

inner join (Exclusao)

ANALISE_TMP:

LOAD

  ID_LEITURA,

    NAME_SYSTEM,

The label ANALISE_TMP: between Inner join and load doesn't seem to be valid - I think you should be getting an error message there. If your script is setup to ignore error messages, than this part of the logic is not being performed, causing issues afterwards.

So, if you look into these issues, you should be able to find out what goes wrong in your script. In addition, analyze table sizes and row counts at the end of your script - are you seeing the expected number of rows in all tables?

One final thought - make sure that you drop all temporary tables and don't cause any unwanted Synthetic keys. That could also be a source of a long wait at the end of the script execution.

cheers,

Oleg Troyansky

Upgrade your Qlik skills with my book QlikView Your Business: An Expert Guide to Business Discovery with QlikView and Qlik Sense

eduardo_dimperi
Valued Contributor II

Re: Problem with Left Join

Hi David, thank you for your time,

I did the deug with limited rows, my response was:

100 - 1000 - 1s

10000 - 8s

100000 -20s

1000000 - 2 minutes

2000000 - more that i can wait and 100% of my server ram memory

About create a key i dont understant, if i already have a commum field (OID_METER) and others, why create another one to make a join? And i need this fields so i can't replace.

Any idea about how process 5.000.000 lines?

and thanks again for help me

eduardo_dimperi
Valued Contributor II

Re: Problem with Left Join

Hi Andrey !

But i need to Join this tables in one table

eduardo_dimperi
Valued Contributor II

Re: Problem with Left Join

Hi Oleg, course i can share my code, see below please. I think you be right about cartesian

for a=0 to 2

 

  let vCarga=Timestamp(Today()-$(a),'YYYYMMDD');

 

  [CONCAT_TABLE_$(vPath)]:

  LOAD

  Distinct

  *

  FROM [lib://Dados/$(vPath)/Leituras/Individual_Read/RS_INDIVIDUAL_READ_$(vCarga).qvd](qvd) where DATE_READ>TODAY()-2;

 

  next

TMP:

LOAD

    OID_SYSTEM,

    OID_GROUP,

    NAME_SYSTEM,

  SSN_REMOTE_SYSTEM,

    STATUS,

    ID_SYSTEM_TYPE

FROM [lib://Dados/$(vPath)\RS/RS_SYSTEM.qvd]

(qvd) WHERE STATUS<>'99' and OID_GROUP<>'28';

LEFT JOIN (TMP)

LOAD

OID_SYSTEM,

bairro as NEIGHB_SYSTEM,

municipio as CITY_SYSTEM

FROM [lib://Dados/$(vPath)\Enderecos/enderecocompleto.qvd]

(qvd);

INNER JOIN (TMP)

LOAD

    OID_GROUP,

    NAME_GROUP

FROM [lib://Dados/$(vPath)\RS/RS_GROUP.qvd]

(qvd);

INNER JOIN (TMP)

LOAD

    OID_UC,

    NAME_UC,

    ADDRESS_BLOCK_UC,

    OID_SYSTEM

FROM [lib://Dados/$(vPath)\RS/RS_UC.qvd]

(qvd) WHERE (UPPER(NAME_UC)<>'PISCINA') AND (UPPER(NAME_UC)<>'MEDIDOR PARALELO');

INNER JOIN (TMP)

LOAD

    OID_METER,

    OID_UC,

    OID_TYPE_METER,

    TYPE_CONSTANT

FROM [lib://Dados/$(vPath)\RS/RS_METER.qvd]

(qvd);

INNER JOIN (TMP)

LOAD

    OID_TYPE_METER,

    NAME_TYPE_METER

FROM [lib://Dados/$(vPath)\RS/RS_TYPE_METER.qvd]

(qvd);

INNER JOIN (TMP)

LOAD

    OID_COUNTERS,

    OID_METER,

    CONCENTRATOR_COUNTERS,

    PORT_COUNTERS

FROM [lib://Dados/$(vPath)\RS/RS_COUNTERS.qvd]

(qvd);

TMP2:

LOAD

NAME_SYSTEM,

NAME_GROUP,

NEIGHB_SYSTEM,

CITY_SYSTEM,

OID_METER,

NAME_TYPE_METER,

TYPE_CONSTANT,

OID_UC,

NAME_UC,

ADDRESS_BLOCK_UC,

OID_SYSTEM&'|'&CONCENTRATOR_COUNTERS&'|'&PORT_COUNTERS AS ID_LEITURA

RESIDENT TMP

WHERE NAME_SYSTEM <> 'MANSAO RAVELLO' AND NAME_TYPE_METER <> 'Agua quente-retorno';

left join (TMP2)

LOAD

    VALUE_READ,

    ID_SYSTEM&'|'&CONCENTRATOR&'|'&PORT AS ID_LEITURA,

    Date(floor(DATE_READ)) AS DATA,

    Time(frac(DATE_READ)) AS TEMPO

RESIDENT [CONCAT_TABLE_$(vPath)] where DATE_READ>=TODAY()-2;

Output:

load

Date(floor(DATE_READ)) AS DATA,

Hour(DATE_READ) AS HORA,

Time(frac(DATE_READ)) AS TEMPO,

ID_SYSTEM&'|'&CONCENTRATOR&'|'&PORT AS ID_LEITURA,

VALUE_READ

resident [CONCAT_TABLE_$(vPath)]

where DATE_READ>=TODAY()-2;

inner join (Output)

load

DATA,

HORA,

ID_LEITURA,

time(min(TEMPO)) as TEMPO

resident Output group by DATA, HORA,ID_LEITURA;

NoConcatenate

VALOR:

LOAD

DATA,

HORA,

TEMPO,

ID_LEITURA,

Min(VALUE_READ) AS VALUE_READ

RESIDENT Output

Group By

DATA,

HORA,

TEMPO,

ID_LEITURA

;

Inner Join(TMP2)

Load

ID_LEITURA,

DATA,

VALUE_READ,

TEMPO

RESIDENT VALOR;

DROP TABLE Output;

DROP TABLE VALOR;

Corrige_Medidor:

LOAD

DISTINCT

OID_METER,

VALUE_READ AS VALOR,

DATA,

TEMPO

RESIDENT TMP2

ORDER BY OID_METER,DATA,TEMPO DESC ;

NoConcatenate

Exclude_Meter:

LOAD

DISTINCT

OID_METER,

VALOR,

If((PEEK(VALOR)- VALOR)=0,OID_METER) AS EXCLUIR,

DATA,

TEMPO

RESIDENT Corrige_Medidor

ORDER BY OID_METER,DATA,TEMPO DESC ;

  

drop table Corrige_Medidor;

Exclusao:

LOAD

    OID_METER,

    EXCLUIR

    RESIDENT Exclude_Meter

    WHERE NOT EXISTS(EXCLUIR,OID_METER)=-1

    ORDER BY EXCLUIR;

   

DROP TABLE Exclude_Meter;

   

inner join (Exclusao)

ANALISE_TMP:

LOAD

  ID_LEITURA,

    NAME_SYSTEM,

    NAME_GROUP,

    NEIGHB_SYSTEM,

    CITY_SYSTEM,

    OID_METER,

    NAME_TYPE_METER,

    TYPE_CONSTANT,

    NAME_UC,

    ADDRESS_BLOCK_UC,

    DATA AS DIA0,

    TEMPO,

    VALUE_READ AS Val0,

    IF(OID_METER=PEEK(OID_METER),PEEK(Val0)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA0

  RESIDENT TMP2

    where FLOOR(DATA)=FLOOR(TODAY()) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)

    ORDER BY OID_METER,DATA,TEMPO DESC ;

Inner JOIN (Exclusao)

LOAD

  OID_METER,

    TYPE_CONSTANT,

    VALUE_READ AS Val1,

    IF(OID_METER=PEEK(OID_METER),PEEK(Val1)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA1

  RESIDENT TMP2

    where FLOOR(DATA)=FLOOR(TODAY()-1) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)

    ORDER BY OID_METER,DATA,TEMPO DESC ;

Inner JOIN (Exclusao)

LOAD

  OID_METER,

    TYPE_CONSTANT,

     VALUE_READ AS Val2,

    IF(OID_METER=PEEK(OID_METER),PEEK(Val2)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA2

  RESIDENT TMP2

    where FLOOR(DATA)=FLOOR(TODAY()-2) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)

    ORDER BY OID_METER,DATA,TEMPO DESC ;  //  <-------- Crash and burn

   

if ([vPath]='MI') then

ANALISE_AGUA:

LOAD

DISTINCT

  ID_LEITURA AS ID_LEITURA_AGUA,

  SubField(ID_LEITURA, '|' ,1) AS OID_SYSTEM_AGUA,

    NAME_SYSTEM AS NAME_SYSTEM_AGUA,

    NAME_GROUP AS NAME_GROUP_AGUA,

    NEIGHB_SYSTEM AS NEIGHB_SYSTEM_AGUA,

    CITY_SYSTEM AS CITY_SYSTEM_AGUA,

    OID_METER AS OID_METER_AGUA,

    NAME_TYPE_METER AS NAME_TYPE_METER_AGUA,

    NAME_UC AS NAME_UC_AGUA,

    ADDRESS_BLOCK_UC AS ADDRESS_BLOCK_UC_AGUA,

    DIA0,

    CONSUMODIA0,

    CONSUMODIA1,

    CONSUMODIA2,

    CONSUMODIA0+CONSUMODIA1+CONSUMODIA2 AS TOTAL_CONSUMO

    RESIDENT Exclusao WHERE CONSUMODIA0>=0.005 AND CONSUMODIA1>=0.005 AND CONSUMODIA2>=0.005 AND NAME_TYPE_METER<>'Gas';

endif

ANALISE_GAS:

LOAD

DISTINCT

    ID_LEITURA AS ID_LEITURA_GAS,   

  SubField(ID_LEITURA, '|' ,1) AS OID_SYSTEM_GAS,

    NAME_SYSTEM AS NAME_SYSTEM_GAS,

    NAME_GROUP AS NAME_GROUP_GAS,

    NEIGHB_SYSTEM AS NEIGHB_SYSTEM_GAS,

    CITY_SYSTEM AS CITY_SYSTEM_GAS,

    OID_METER AS OID_METER_GAS,

    NAME_TYPE_METER AS NAME_TYPE_METER_GAS,

    NAME_UC AS NAME_UC_GAS,

    ADDRESS_BLOCK_UC AS ADDRESS_BLOCK_UC_GAS,

    DIA0,

    CONSUMODIA0,

    CONSUMODIA1,

    CONSUMODIA2,

    CONSUMODIA0+CONSUMODIA1+CONSUMODIA2 AS TOTAL_CONSUMO

    RESIDENT Exclusao WHERE CONSUMODIA0>=0.005 AND CONSUMODIA1>=0.005 AND CONSUMODIA2>=0.005 AND NAME_TYPE_METER ='Gas';

  

DROP TABLE Exclusao;

CONSUMO:

LOAD

  ID_LEITURA,

    NAME_SYSTEM,

    NAME_GROUP,

    NEIGHB_SYSTEM,

    CITY_SYSTEM,

    OID_METER,

    NAME_TYPE_METER,

    TYPE_CONSTANT,

    NAME_UC,

    ADDRESS_BLOCK_UC,

    DATA,

  TEMPO,

    VALUE_READ AS Val,

    IF(OID_METER=PEEK(OID_METER),PEEK(Val)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMO_GERAL

  RESIDENT TMP2

    where FLOOR(DATA)=FLOOR(TODAY()) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)

    ORDER BY OID_METER,DATA,TEMPO DESC;

Inner Join(CONSUMO)

CONTA_MEDIDORES:

LOAD

COUNT(OID_METER) AS NUM_MEDIDORES,

NAME_SYSTEM

RESIDENT TMP

GROUP BY

NAME_SYSTEM;

DROP TABLE TMP, TMP2;

CONSUMO_REGULAR:

LOAD

    OID_METER

    ,CONSUMO_GERAL AS CONSUMO_REGULAR

    ,NAME_SYSTEM AS NOME_SISTEMA

    ,NAME_GROUP AS NOME_GRUPO

    ,NEIGHB_SYSTEM AS NOME_BAIRRO

    ,CITY_SYSTEM AS NOME_CIDADE

    ,NAME_TYPE_METER AS TIPO_MEDIDOR

    ,TYPE_CONSTANT AS TIPO_CONSTANTE

    ,NAME_UC AS NOME_UC

    ,ADDRESS_BLOCK_UC AS NOME_BLOCO

    ,TODAY() AS DATA_REFERENCIA

    ,NUM_MEDIDORES

Resident CONSUMO

WHERE CONSUMO_GERAL <0.05

ORDER BY OID_METER DESC;

if ([vPath]='MI') then

INNER JOIN(ANALISE_AGUA)

TOTAL_CONSUMO_REGULAR_AGUA:

LOAD

NOME_SISTEMA AS NAME_SYSTEM_AGUA,

NUM_MEDIDORES,

SUM(CONSUMO_REGULAR) AS CONSUMO_TOTAL,

SUM(CONSUMO_REGULAR)/NUM_MEDIDORES AS MEDIA_CONSUMO_TOTAL

RESIDENT CONSUMO_REGULAR

GROUP BY NOME_SISTEMA,NUM_MEDIDORES;

endif

INNER JOIN(ANALISE_GAS)

TOTAL_CONSUMO_REGULAR_GAS:

LOAD

NOME_SISTEMA AS NAME_SYSTEM_GAS,

NUM_MEDIDORES,

SUM(CONSUMO_REGULAR) AS CONSUMO_TOTAL,

SUM(CONSUMO_REGULAR)/NUM_MEDIDORES AS MEDIA_CONSUMO_TOTAL

RESIDENT CONSUMO_REGULAR

GROUP BY NOME_SISTEMA,NUM_MEDIDORES;

DROP TABLE CONSUMO,[CONCAT_TABLE_$(vPath)],CONSUMO_REGULAR;

GESTAO_AGUA:

  LOAD

  RESPONSAVEL AS RESPONSAVEL_AGUA,

  SUBSTITUTO AS SUBSTITUTO_AGUA,

  OID_SYSTEM AS OID_SYSTEM_AGUA

FROM [lib://Dados/$(vPath)/Gestao/GESTAO_CAS.qvd] (qvd);

GESTAO_GAS:

  LOAD

  RESPONSAVEL AS RESPONSAVEL_GAS,

  SUBSTITUTO AS SUBSTITUTO_GAS,

  OID_SYSTEM AS OID_SYSTEM_GAS

FROM [lib://Dados/$(vPath)/Gestao/GESTAO_CAS.qvd] (qvd);

ahaahaaha
Honored Contributor

Re: Problem with Left Join

Both Left Join from table TMP2. May be create a temporary table of TMP2, then apply Left Join once?

Like this

TempTable:

LOAD

  OID_METER,

    TYPE_CONSTANT,

    VALUE_READ AS Val1,

    IF(OID_METER=PEEK(OID_METER),PEEK(Val1)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA1

  RESIDENT TMP2

    where FLOOR(DATA)=FLOOR(TODAY()-1) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)

    ORDER BY OID_METER,DATA,TEMPO DESC ;

LEFT JOIN

LOAD

  OID_METER,

    TYPE_CONSTANT,

    VALUE_READ AS Val2,

    IF(OID_METER=PEEK(OID_METER),PEEK(Val2)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA2

  RESIDENT TMP2

    where FLOOR(DATA)=FLOOR(TODAY()-2) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)

    ORDER BY OID_METER,DATA,TEMPO DESC ;

LEFT JOIN (Exclusao)

LOAD

...

Resident TempTable;

DROP Table TempTable;

eduardo_dimperi
Valued Contributor II

Re: Problem with Left Join

Oleg, i was considereing this part of your explanation

"

Next, I think you might experience severe duplication issues. The last 2 LEFT JOIN sentences contain

OID_METER and TYPE_CONSTANT, so all possible values Val1 will be matched with all possible values of Val2 for the same OID_METER and TYPE_CONSTANT. If you should have many transactions with the same key values, the row counts will be multiplied - 1000 rows with Val1 and 1000 rows with Val2 will produce 1,000,000 rows."

And you are right i have many transactions and like you said my code is cheking all values Val1 with all values Val2.

But how can i join 2 tables where OID_METER and TYPE_CONSTANT are the same whitout this cartesian effect?

Another question about it, in 2 tables with 4 fields for example, is better have 3 keys (3 common fields) and 1 not Or 1 key and 3 fields that doesnt have the same name?

Thank you

eduardo_dimperi
Valued Contributor II

Re: Problem with Left Join

Hummm..could be Andrey, I'll try right now