Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
lidiavonkrond
Partner - Contributor III
Partner - Contributor III

how to decrease time for Group by on multiple fields

Hi,

How can i decrease the reload time for this transform script on a metric with over 300.000.000 records to the best practice

It now takes over 30 minutes and use of 90% of the available memory (group by)

We have to add a calculation on the date, because orders picked till 3 o'clock next day have to record on current day. 

This will take even more time, what will increase the time out of the limits of the timeframe

Has anyone an idea where to change and where to add the calculation for best practice.

- build up the where statements ?

- decrease memory use?

- speed up group by?

I expect there will be multiple "little" solutions so every idea is helpful


No need for full new script, just tips and tricks then i will figure it out


Thanks in advance

Lidia

variables

variables below are used for  different kpi in the same script diff mappings

SET vVolumeIndicator  = [%QWHM001]     LET vTableKey  = '$(vVolumeIndicator)'

LET vKpiFields  = '[0PLANT], [ZELMTY], [ZQDATU], [ZVLPLA], [ZNLTYP], [ZVLTYP], [ZQNAME], [ZENAME], [0SHIP_POINT], [0CREATEDON],[ZEDATU]';


NEW CALCULATION TO ADD

This calculation will take an extra field:  ZEZEIT

IF( PurgeChar([ZEZEIT],':')<33000, DATE(QZEDATU -1,'YYYY-MM-DD'),  ZEDATUas  ZEDATU)


mapping

mapDelvryTypeOP:
Mapping LOAD * Inline [
DelvryType, X
      ZNHB, X
      ZNHS, X
      ZENR, X
      ZIFU, X
      ZSEU, X
]
;

script

_tmp:
LOAD distinct
         
$(vTableKey)
      ,   
$(vKpiFields)
      ,   
[QWHM001.ZNISTA]
FROM  [$(vSourcePath1)$(vSourceQvd1).qvd] (qvd)
Where
           (
ApplyMap('mapMovtTypeDelvrynote', [QWHM001.ZBWLVS], '')='X' ) 
      AND NOT

          
( ApplyMap('mapDelvryTypeOP', [QWHM001.0DEL_TYPE], '')='X' )
      AND
NOT
          
( ApplyMap('mapShipPtPO', [QWHM001.0SHIP_POINT], '')='X' )
     
AND
          
num(date(QWHM001.ZQDATU,'m/d/yyyy')) >=  $(vStartKpiNumDay)   
    AND NOT
          
( ApplyMap('mapWarehtypeOP', [QWHM001.ZVLTYP], '')='X' )    // v2.1 new
    ;


NoConcatenate
[$(vTable)]:
LOAD
     
$(vKpiFields)
      ,   
Sum([QWHM001.ZNISTA])          as [LogPerfKPI_6]
Resident _tmp
Group By $(vKpiFields);

DROP table _tmp;

2 Replies
marcus_sommer

At first I would check if I could load the first qvd-load with a where exists() clause to keep the load optimized - your displayed mapping indicate that this might be possible. You have here several mapping-clauses and a date-filter but if your qvd is splitted in those parts you could concatenate the various single loads in the optimized mode.

At next I would store these table as qvd (and drop these table) and then loading the next step from there and not from resident - the differences in load-times between resident and from qvd aren't very big (I have never tested it directly) but you will save RAM because the first table isn't within the RAM anymore - and should your system begin to swap RAM on the harddisc (which your > 90% of RAM indicates) it will be very slow. In such cases it makes sense to split the loadings in smaller slices and concatenate them later again.

Then I would consider if I could apply an inremental load approach. To the mentioned points you will find here many more informations: Advanced topics for creating a qlik datamodel

Further I would have a look on the reducing-factor from the group by load - if the reducing-factor rather small you might didn't need these group by. In former days I did a lot of group by loadings and quite often the reducing-factor was by 1.5 - 3 and the loadings went over hours and without them I have more records of course but the response-times of my applications differs not much between them - but you will need to test it.

- Marcus

lidiavonkrond
Partner - Contributor III
Partner - Contributor III
Author

Hello Marcus,

Tnx for your input.

Where exits was no option  - mappings were all inlines with multiple statements.

save and load tmp qvd almost no reduce, and not in line of the companies template/environment

f.i.

Next options gave me almost 15 min reduse and half of the memory

- changed the value in de applymap to the number  1 instead of the character X

- no if state in the applymap

- optimized order by load  before group by  (_tmp2: LOAD *, 1 as tmp  Resident _tmp Order By $(vKpiFields); )

After test i add the calculation in the order by load  and  in the end still had 10 min saved

Other suggestions still welcome

-------------

_tmp:  LOAD distinct
      $(vTableKey),
      $(vKpiFields),
       .ZEZEIT],
       [ZNISTA]
FROM [$(vSourcePath1)$(vSourceQvd1).qvd] (qvd)    
Where
                 ( ApplyMap('mapMovtTypeOP', [ZBWLVS])=1 ) 
  AND
                 num(date(ZQDATU,'m/d/yyyy')) >=  $(vStartKpiNumDay)    
  AND NOT
               ( ApplyMap('mapDelvryTypeOP', [0DEL_TYPE])= 1 )
AND NOT
                ( ApplyMap('mapShipPtPO', [0SHIP_POINT])= 1 )        
AND NOT
               ( ApplyMap('mapWarehtypeOP', [ZVLTYP])= 1 )
   ;
           NoConcatenate
_tmp2:  LOAD
           $(vKpiFields),                      
           IF(PurgeChar( [ZEZEIT],':')< 33000, DATE(ZEDATU -1,'YYYY-MM-DD'),  ZEDATU) as ZEDATU_tmp,
             // orders picked after the next day before half pas three have to be registerd as workload on today. first pick time for new workload is 5 o'clock
           [ZNISTA]
Resident _tmp  Order By $(vKpiFields) ;          
DROP table _tmp;    DROP FIELD ZEDATU ;   RENAME field ZEDATU_tmp to ZEDATU;
           NoConcatenate
[$(vTable)]:  LOAD
             $(vKpiFields),
              Sum([ZNISTA])       as [LogPerfKPI_6]
Resident _tmp2  Group By $(vKpiFields);
DROP table _tmp2;