Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi !
I need a little help to understand why i can not still use this code.
My code works, but last month my data had increase and my code pass to process from 2 minutes to 5 hours and consumo a lot of memory, but my data not increase that much. All problem occur in the last left join.
We talk about 5 milions of lines (Table Exclusao) to left join with 140 thousand lines, that i think is a small table to QlikSense
Any ideia why or how i can workaround this?
Exclusao:
LOAD
OID_METER,
EXCLUIR
RESIDENT Exclude_Meter
WHERE NOT EXISTS(EXCLUIR,OID_METER)=-1
ORDER BY EXCLUIR;
DROP TABLE Exclude_Meter;
inner join (Exclusao)
ANALISE_TMP:
LOAD
ID_LEITURA,
NAME_SYSTEM,
NAME_GROUP,
NEIGHB_SYSTEM,
CITY_SYSTEM,
OID_METER,
NAME_TYPE_METER,
TYPE_CONSTANT,
NAME_UC,
ADDRESS_BLOCK_UC,
DATA AS DIA0,
TEMPO,
VALUE_READ AS Val0,
IF(OID_METER=PEEK(OID_METER),PEEK(Val0)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA0
RESIDENT TMP2
where FLOOR(DATA)=FLOOR(TODAY()) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)
ORDER BY OID_METER,DATA,TEMPO DESC ;
LEFT JOIN (Exclusao)
LOAD
OID_METER,
TYPE_CONSTANT,
VALUE_READ AS Val1,
IF(OID_METER=PEEK(OID_METER),PEEK(Val1)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA1
RESIDENT TMP2
where FLOOR(DATA)=FLOOR(TODAY()-1) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)
ORDER BY OID_METER,DATA,TEMPO DESC ;
//LEFT JOIN (Exclusao)
LOAD
OID_METER,
TYPE_CONSTANT,
VALUE_READ AS Val2,
IF(OID_METER=PEEK(OID_METER),PEEK(Val2)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA2
RESIDENT TMP2
where FLOOR(DATA)=FLOOR(TODAY()-2) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)
ORDER BY OID_METER,DATA,TEMPO DESC ;
Hi Eduardo,
the answers to your questions completely depend on your business analytics needs. If you have many transactions for the same Meter and Type, and you don't care about Date/Time, you can:
- concatenate all transactions into one table instead of joining, and that would eliminate duplication
- aggregate all transactions into one number per Meter and Type, using GROUP BY.
Regarding your second question - having 2 tables with 3 common fields would cause a Synthetic Key. Personally, I'm of the opinion that synthetic keys should be avoided. Other experts may tell you otherwise. Ultimately, the structure should be driven by your business needs.
cheers.
Oleg Troyansky
Hi Eduardo,
If you the time to process goes from minutes to hours then for sure there is an issue with the keys by which the tables are joined. You can check by running the code in the debug mode with a limited number of rows (e.g. 50).
Also, it is a good practice to create composite keys when joining tables by multiple fields (e.g OID_METER&'_'&TYPE_CONSTANT AS _OID_METER_TYPE_KEY)
Regards,
David
Hi Eduardo,
May be the code construction of
Table1:
LOAD
...
Resident....
Left Join
LOAD
....
Resident Table1;
to replace by
Table1:
LOAD
...
Resident....
NoConcatenate
Table2:
LOAD
....
Resident Table1;
DROP Table Table1;
?
Regards,
Andrey
Hi Eduardo,
it would be a lot easier to troubleshoot your problem if we could see the actual app. Without it, I can only imagine what might happen and what might cause such a huge difference in processing times.
At first, I suspected that you may have misspelled one of the Key field names, which could cause a Cartesian Join. Based on your script, it doesn't seem to be the case.
Next, I think you might experience severe duplication issues. The last 2 LEFT JOIN sentences contain
OID_METER and TYPE_CONSTANT, so all possible values Val1 will be matched with all possible values of Val2 for the same OID_METER and TYPE_CONSTANT. If you should have many transactions with the same key values, the row counts will be multiplied - 1000 rows with Val1 and 1000 rows with Val2 will produce 1,000,000 rows.
Finally, I spotted a syntax error in your script:
inner join (Exclusao)
ANALISE_TMP:
LOAD
ID_LEITURA,
NAME_SYSTEM,
The label ANALISE_TMP: between Inner join and load doesn't seem to be valid - I think you should be getting an error message there. If your script is setup to ignore error messages, than this part of the logic is not being performed, causing issues afterwards.
So, if you look into these issues, you should be able to find out what goes wrong in your script. In addition, analyze table sizes and row counts at the end of your script - are you seeing the expected number of rows in all tables?
One final thought - make sure that you drop all temporary tables and don't cause any unwanted Synthetic keys. That could also be a source of a long wait at the end of the script execution.
cheers,
Oleg Troyansky
Upgrade your Qlik skills with my book QlikView Your Business: An Expert Guide to Business Discovery with QlikView and Qlik Sense
Hi David, thank you for your time,
I did the deug with limited rows, my response was:
100 - 1000 - 1s
10000 - 8s
100000 -20s
1000000 - 2 minutes
2000000 - more that i can wait and 100% of my server ram memory
About create a key i dont understant, if i already have a commum field (OID_METER) and others, why create another one to make a join? And i need this fields so i can't replace.
Any idea about how process 5.000.000 lines?
and thanks again for help me
Hi Andrey !
But i need to Join this tables in one table
Hi Oleg, course i can share my code, see below please. I think you be right about cartesian
for a=0 to 2
let vCarga=Timestamp(Today()-$(a),'YYYYMMDD');
[CONCAT_TABLE_$(vPath)]:
LOAD
Distinct
*
FROM [lib://Dados/$(vPath)/Leituras/Individual_Read/RS_INDIVIDUAL_READ_$(vCarga).qvd](qvd) where DATE_READ>TODAY()-2;
next
TMP:
LOAD
OID_SYSTEM,
OID_GROUP,
NAME_SYSTEM,
SSN_REMOTE_SYSTEM,
STATUS,
ID_SYSTEM_TYPE
FROM [lib://Dados/$(vPath)\RS/RS_SYSTEM.qvd]
(qvd) WHERE STATUS<>'99' and OID_GROUP<>'28';
LEFT JOIN (TMP)
LOAD
OID_SYSTEM,
bairro as NEIGHB_SYSTEM,
municipio as CITY_SYSTEM
FROM [lib://Dados/$(vPath)\Enderecos/enderecocompleto.qvd]
(qvd);
INNER JOIN (TMP)
LOAD
OID_GROUP,
NAME_GROUP
FROM [lib://Dados/$(vPath)\RS/RS_GROUP.qvd]
(qvd);
INNER JOIN (TMP)
LOAD
OID_UC,
NAME_UC,
ADDRESS_BLOCK_UC,
OID_SYSTEM
FROM [lib://Dados/$(vPath)\RS/RS_UC.qvd]
(qvd) WHERE (UPPER(NAME_UC)<>'PISCINA') AND (UPPER(NAME_UC)<>'MEDIDOR PARALELO');
INNER JOIN (TMP)
LOAD
OID_METER,
OID_UC,
OID_TYPE_METER,
TYPE_CONSTANT
FROM [lib://Dados/$(vPath)\RS/RS_METER.qvd]
(qvd);
INNER JOIN (TMP)
LOAD
OID_TYPE_METER,
NAME_TYPE_METER
FROM [lib://Dados/$(vPath)\RS/RS_TYPE_METER.qvd]
(qvd);
INNER JOIN (TMP)
LOAD
OID_COUNTERS,
OID_METER,
CONCENTRATOR_COUNTERS,
PORT_COUNTERS
FROM [lib://Dados/$(vPath)\RS/RS_COUNTERS.qvd]
(qvd);
TMP2:
LOAD
NAME_SYSTEM,
NAME_GROUP,
NEIGHB_SYSTEM,
CITY_SYSTEM,
OID_METER,
NAME_TYPE_METER,
TYPE_CONSTANT,
OID_UC,
NAME_UC,
ADDRESS_BLOCK_UC,
OID_SYSTEM&'|'&CONCENTRATOR_COUNTERS&'|'&PORT_COUNTERS AS ID_LEITURA
RESIDENT TMP
WHERE NAME_SYSTEM <> 'MANSAO RAVELLO' AND NAME_TYPE_METER <> 'Agua quente-retorno';
left join (TMP2)
LOAD
VALUE_READ,
ID_SYSTEM&'|'&CONCENTRATOR&'|'&PORT AS ID_LEITURA,
Date(floor(DATE_READ)) AS DATA,
Time(frac(DATE_READ)) AS TEMPO
RESIDENT [CONCAT_TABLE_$(vPath)] where DATE_READ>=TODAY()-2;
Output:
load
Date(floor(DATE_READ)) AS DATA,
Hour(DATE_READ) AS HORA,
Time(frac(DATE_READ)) AS TEMPO,
ID_SYSTEM&'|'&CONCENTRATOR&'|'&PORT AS ID_LEITURA,
VALUE_READ
resident [CONCAT_TABLE_$(vPath)]
where DATE_READ>=TODAY()-2;
inner join (Output)
load
DATA,
HORA,
ID_LEITURA,
time(min(TEMPO)) as TEMPO
resident Output group by DATA, HORA,ID_LEITURA;
NoConcatenate
VALOR:
LOAD
DATA,
HORA,
TEMPO,
ID_LEITURA,
Min(VALUE_READ) AS VALUE_READ
RESIDENT Output
Group By
DATA,
HORA,
TEMPO,
ID_LEITURA
;
Inner Join(TMP2)
Load
ID_LEITURA,
DATA,
VALUE_READ,
TEMPO
RESIDENT VALOR;
DROP TABLE Output;
DROP TABLE VALOR;
Corrige_Medidor:
LOAD
DISTINCT
OID_METER,
VALUE_READ AS VALOR,
DATA,
TEMPO
RESIDENT TMP2
ORDER BY OID_METER,DATA,TEMPO DESC ;
NoConcatenate
Exclude_Meter:
LOAD
DISTINCT
OID_METER,
VALOR,
If((PEEK(VALOR)- VALOR)=0,OID_METER) AS EXCLUIR,
DATA,
TEMPO
RESIDENT Corrige_Medidor
ORDER BY OID_METER,DATA,TEMPO DESC ;
drop table Corrige_Medidor;
Exclusao:
LOAD
OID_METER,
EXCLUIR
RESIDENT Exclude_Meter
WHERE NOT EXISTS(EXCLUIR,OID_METER)=-1
ORDER BY EXCLUIR;
DROP TABLE Exclude_Meter;
inner join (Exclusao)
ANALISE_TMP:
LOAD
ID_LEITURA,
NAME_SYSTEM,
NAME_GROUP,
NEIGHB_SYSTEM,
CITY_SYSTEM,
OID_METER,
NAME_TYPE_METER,
TYPE_CONSTANT,
NAME_UC,
ADDRESS_BLOCK_UC,
DATA AS DIA0,
TEMPO,
VALUE_READ AS Val0,
IF(OID_METER=PEEK(OID_METER),PEEK(Val0)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA0
RESIDENT TMP2
where FLOOR(DATA)=FLOOR(TODAY()) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)
ORDER BY OID_METER,DATA,TEMPO DESC ;
Inner JOIN (Exclusao)
LOAD
OID_METER,
TYPE_CONSTANT,
VALUE_READ AS Val1,
IF(OID_METER=PEEK(OID_METER),PEEK(Val1)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA1
RESIDENT TMP2
where FLOOR(DATA)=FLOOR(TODAY()-1) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)
ORDER BY OID_METER,DATA,TEMPO DESC ;
Inner JOIN (Exclusao)
LOAD
OID_METER,
TYPE_CONSTANT,
VALUE_READ AS Val2,
IF(OID_METER=PEEK(OID_METER),PEEK(Val2)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA2
RESIDENT TMP2
where FLOOR(DATA)=FLOOR(TODAY()-2) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)
ORDER BY OID_METER,DATA,TEMPO DESC ; // <-------- Crash and burn
if ([vPath]='MI') then
ANALISE_AGUA:
LOAD
DISTINCT
ID_LEITURA AS ID_LEITURA_AGUA,
SubField(ID_LEITURA, '|' ,1) AS OID_SYSTEM_AGUA,
NAME_SYSTEM AS NAME_SYSTEM_AGUA,
NAME_GROUP AS NAME_GROUP_AGUA,
NEIGHB_SYSTEM AS NEIGHB_SYSTEM_AGUA,
CITY_SYSTEM AS CITY_SYSTEM_AGUA,
OID_METER AS OID_METER_AGUA,
NAME_TYPE_METER AS NAME_TYPE_METER_AGUA,
NAME_UC AS NAME_UC_AGUA,
ADDRESS_BLOCK_UC AS ADDRESS_BLOCK_UC_AGUA,
DIA0,
CONSUMODIA0,
CONSUMODIA1,
CONSUMODIA2,
CONSUMODIA0+CONSUMODIA1+CONSUMODIA2 AS TOTAL_CONSUMO
RESIDENT Exclusao WHERE CONSUMODIA0>=0.005 AND CONSUMODIA1>=0.005 AND CONSUMODIA2>=0.005 AND NAME_TYPE_METER<>'Gas';
endif
ANALISE_GAS:
LOAD
DISTINCT
ID_LEITURA AS ID_LEITURA_GAS,
SubField(ID_LEITURA, '|' ,1) AS OID_SYSTEM_GAS,
NAME_SYSTEM AS NAME_SYSTEM_GAS,
NAME_GROUP AS NAME_GROUP_GAS,
NEIGHB_SYSTEM AS NEIGHB_SYSTEM_GAS,
CITY_SYSTEM AS CITY_SYSTEM_GAS,
OID_METER AS OID_METER_GAS,
NAME_TYPE_METER AS NAME_TYPE_METER_GAS,
NAME_UC AS NAME_UC_GAS,
ADDRESS_BLOCK_UC AS ADDRESS_BLOCK_UC_GAS,
DIA0,
CONSUMODIA0,
CONSUMODIA1,
CONSUMODIA2,
CONSUMODIA0+CONSUMODIA1+CONSUMODIA2 AS TOTAL_CONSUMO
RESIDENT Exclusao WHERE CONSUMODIA0>=0.005 AND CONSUMODIA1>=0.005 AND CONSUMODIA2>=0.005 AND NAME_TYPE_METER ='Gas';
DROP TABLE Exclusao;
CONSUMO:
LOAD
ID_LEITURA,
NAME_SYSTEM,
NAME_GROUP,
NEIGHB_SYSTEM,
CITY_SYSTEM,
OID_METER,
NAME_TYPE_METER,
TYPE_CONSTANT,
NAME_UC,
ADDRESS_BLOCK_UC,
DATA,
TEMPO,
VALUE_READ AS Val,
IF(OID_METER=PEEK(OID_METER),PEEK(Val)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMO_GERAL
RESIDENT TMP2
where FLOOR(DATA)=FLOOR(TODAY()) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)
ORDER BY OID_METER,DATA,TEMPO DESC;
Inner Join(CONSUMO)
CONTA_MEDIDORES:
LOAD
COUNT(OID_METER) AS NUM_MEDIDORES,
NAME_SYSTEM
RESIDENT TMP
GROUP BY
NAME_SYSTEM;
DROP TABLE TMP, TMP2;
CONSUMO_REGULAR:
LOAD
OID_METER
,CONSUMO_GERAL AS CONSUMO_REGULAR
,NAME_SYSTEM AS NOME_SISTEMA
,NAME_GROUP AS NOME_GRUPO
,NEIGHB_SYSTEM AS NOME_BAIRRO
,CITY_SYSTEM AS NOME_CIDADE
,NAME_TYPE_METER AS TIPO_MEDIDOR
,TYPE_CONSTANT AS TIPO_CONSTANTE
,NAME_UC AS NOME_UC
,ADDRESS_BLOCK_UC AS NOME_BLOCO
,TODAY() AS DATA_REFERENCIA
,NUM_MEDIDORES
Resident CONSUMO
WHERE CONSUMO_GERAL <0.05
ORDER BY OID_METER DESC;
if ([vPath]='MI') then
INNER JOIN(ANALISE_AGUA)
TOTAL_CONSUMO_REGULAR_AGUA:
LOAD
NOME_SISTEMA AS NAME_SYSTEM_AGUA,
NUM_MEDIDORES,
SUM(CONSUMO_REGULAR) AS CONSUMO_TOTAL,
SUM(CONSUMO_REGULAR)/NUM_MEDIDORES AS MEDIA_CONSUMO_TOTAL
RESIDENT CONSUMO_REGULAR
GROUP BY NOME_SISTEMA,NUM_MEDIDORES;
endif
INNER JOIN(ANALISE_GAS)
TOTAL_CONSUMO_REGULAR_GAS:
LOAD
NOME_SISTEMA AS NAME_SYSTEM_GAS,
NUM_MEDIDORES,
SUM(CONSUMO_REGULAR) AS CONSUMO_TOTAL,
SUM(CONSUMO_REGULAR)/NUM_MEDIDORES AS MEDIA_CONSUMO_TOTAL
RESIDENT CONSUMO_REGULAR
GROUP BY NOME_SISTEMA,NUM_MEDIDORES;
DROP TABLE CONSUMO,[CONCAT_TABLE_$(vPath)],CONSUMO_REGULAR;
GESTAO_AGUA:
LOAD
RESPONSAVEL AS RESPONSAVEL_AGUA,
SUBSTITUTO AS SUBSTITUTO_AGUA,
OID_SYSTEM AS OID_SYSTEM_AGUA
FROM [lib://Dados/$(vPath)/Gestao/GESTAO_CAS.qvd] (qvd);
GESTAO_GAS:
LOAD
RESPONSAVEL AS RESPONSAVEL_GAS,
SUBSTITUTO AS SUBSTITUTO_GAS,
OID_SYSTEM AS OID_SYSTEM_GAS
FROM [lib://Dados/$(vPath)/Gestao/GESTAO_CAS.qvd] (qvd);
Both Left Join from table TMP2. May be create a temporary table of TMP2, then apply Left Join once?
Like this
TempTable:
LOAD
OID_METER,
TYPE_CONSTANT,
VALUE_READ AS Val1,
IF(OID_METER=PEEK(OID_METER),PEEK(Val1)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA1
RESIDENT TMP2
where FLOOR(DATA)=FLOOR(TODAY()-1) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)
ORDER BY OID_METER,DATA,TEMPO DESC ;
LEFT JOIN
LOAD
OID_METER,
TYPE_CONSTANT,
VALUE_READ AS Val2,
IF(OID_METER=PEEK(OID_METER),PEEK(Val2)*PEEK(TYPE_CONSTANT)-VALUE_READ*TYPE_CONSTANT) AS CONSUMODIA2
RESIDENT TMP2
where FLOOR(DATA)=FLOOR(TODAY()-2) AND (HOUR(TEMPO)=2 OR HOUR(TEMPO)=5)
ORDER BY OID_METER,DATA,TEMPO DESC ;
LEFT JOIN (Exclusao)
LOAD
...
Resident TempTable;
DROP Table TempTable;
Oleg, i was considereing this part of your explanation
"
Next, I think you might experience severe duplication issues. The last 2 LEFT JOIN sentences contain
OID_METER and TYPE_CONSTANT, so all possible values Val1 will be matched with all possible values of Val2 for the same OID_METER and TYPE_CONSTANT. If you should have many transactions with the same key values, the row counts will be multiplied - 1000 rows with Val1 and 1000 rows with Val2 will produce 1,000,000 rows."
And you are right i have many transactions and like you said my code is cheking all values Val1 with all values Val2.
But how can i join 2 tables where OID_METER and TYPE_CONSTANT are the same whitout this cartesian effect?
Another question about it, in 2 tables with 4 fields for example, is better have 3 keys (3 common fields) and 1 not Or 1 key and 3 fields that doesnt have the same name?
Thank you
Hummm..could be Andrey, I'll try right now