Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
How can i decrease the reload time for this transform script on a metric with over 300.000.000 records to the best practice
It now takes over 30 minutes and use of 90% of the available memory (group by)
We have to add a calculation on the date, because orders picked till 3 o'clock next day have to record on current day.
This will take even more time, what will increase the time out of the limits of the timeframe
Has anyone an idea where to change and where to add the calculation for best practice.
- build up the where statements ?
- decrease memory use?
- speed up group by?
I expect there will be multiple "little" solutions so every idea is helpful
No need for full new script, just tips and tricks then i will figure it out
Thanks in advance
Lidia
variables
variables below are used for different kpi in the same script diff mappings
SET vVolumeIndicator = [%QWHM001]; LET vTableKey = '$(vVolumeIndicator)'
LET vKpiFields = '[0PLANT], [ZELMTY], [ZQDATU], [ZVLPLA], [ZNLTYP], [ZVLTYP], [ZQNAME], [ZENAME], [0SHIP_POINT], [0CREATEDON],[ZEDATU]';
NEW CALCULATION TO ADD
This calculation will take an extra field: ZEZEIT
IF( PurgeChar([ZEZEIT],':')<33000, DATE(QZEDATU -1,'YYYY-MM-DD'), ZEDATU) as ZEDATU)
mapping
mapDelvryTypeOP:
Mapping LOAD * Inline [
DelvryType, X
ZNHB, X
ZNHS, X
ZENR, X
ZIFU, X
ZSEU, X
];
script
_tmp:
LOAD distinct
$(vTableKey)
, $(vKpiFields)
, [QWHM001.ZNISTA]
FROM [$(vSourcePath1)$(vSourceQvd1).qvd] (qvd)
Where
( ApplyMap('mapMovtTypeDelvrynote', [QWHM001.ZBWLVS], '')='X' )
AND NOT
( ApplyMap('mapDelvryTypeOP', [QWHM001.0DEL_TYPE], '')='X' )
AND NOT
( ApplyMap('mapShipPtPO', [QWHM001.0SHIP_POINT], '')='X' )
AND
num(date(QWHM001.ZQDATU,'m/d/yyyy')) >= $(vStartKpiNumDay)
AND NOT
( ApplyMap('mapWarehtypeOP', [QWHM001.ZVLTYP], '')='X' ) // v2.1 new
;
NoConcatenate
[$(vTable)]:
LOAD
$(vKpiFields)
, Sum([QWHM001.ZNISTA]) as [LogPerfKPI_6]
Resident _tmp
Group By $(vKpiFields);
DROP table _tmp;
At first I would check if I could load the first qvd-load with a where exists() clause to keep the load optimized - your displayed mapping indicate that this might be possible. You have here several mapping-clauses and a date-filter but if your qvd is splitted in those parts you could concatenate the various single loads in the optimized mode.
At next I would store these table as qvd (and drop these table) and then loading the next step from there and not from resident - the differences in load-times between resident and from qvd aren't very big (I have never tested it directly) but you will save RAM because the first table isn't within the RAM anymore - and should your system begin to swap RAM on the harddisc (which your > 90% of RAM indicates) it will be very slow. In such cases it makes sense to split the loadings in smaller slices and concatenate them later again.
Then I would consider if I could apply an inremental load approach. To the mentioned points you will find here many more informations: Advanced topics for creating a qlik datamodel
Further I would have a look on the reducing-factor from the group by load - if the reducing-factor rather small you might didn't need these group by. In former days I did a lot of group by loadings and quite often the reducing-factor was by 1.5 - 3 and the loadings went over hours and without them I have more records of course but the response-times of my applications differs not much between them - but you will need to test it.
- Marcus
Hello Marcus,
Tnx for your input.
Where exits was no option - mappings were all inlines with multiple statements.
save and load tmp qvd almost no reduce, and not in line of the companies template/environment
f.i.
Next options gave me almost 15 min reduse and half of the memory
- changed the value in de applymap to the number 1 instead of the character X
- no if state in the applymap
- optimized order by load before group by (_tmp2: LOAD *, 1 as tmp Resident _tmp Order By $(vKpiFields); )
After test i add the calculation in the order by load and in the end still had 10 min saved
Other suggestions still welcome
-------------
_tmp: LOAD distinct
$(vTableKey),
$(vKpiFields),
.ZEZEIT],
[ZNISTA]
FROM [$(vSourcePath1)$(vSourceQvd1).qvd] (qvd)
Where
( ApplyMap('mapMovtTypeOP', [ZBWLVS])=1 )
AND
num(date(ZQDATU,'m/d/yyyy')) >= $(vStartKpiNumDay)
AND NOT
( ApplyMap('mapDelvryTypeOP', [0DEL_TYPE])= 1 )
AND NOT
( ApplyMap('mapShipPtPO', [0SHIP_POINT])= 1 )
AND NOT
( ApplyMap('mapWarehtypeOP', [ZVLTYP])= 1 )
;
NoConcatenate
_tmp2: LOAD
$(vKpiFields),
IF(PurgeChar( [ZEZEIT],':')< 33000, DATE(ZEDATU -1,'YYYY-MM-DD'), ZEDATU) as ZEDATU_tmp,
// orders picked after the next day before half pas three have to be registerd as workload on today. first pick time for new workload is 5 o'clock
[ZNISTA]
Resident _tmp Order By $(vKpiFields) ;
DROP table _tmp; DROP FIELD ZEDATU ; RENAME field ZEDATU_tmp to ZEDATU;
NoConcatenate
[$(vTable)]: LOAD
$(vKpiFields),
Sum([ZNISTA]) as [LogPerfKPI_6]
Resident _tmp2 Group By $(vKpiFields);
DROP table _tmp2;