How to reduce memory allocation for grouping data ... - Page 2 - Qlik Community

Report Inappropriate Content · ‎2010-11-18

Initially I had this:


AdPlayed:
LOAD
 StationId
 ,AdId
 ,EST_DateTime
 ,0 as Predicted
 ,sum(AdsPlayedAmount) as AdsPlayedAmount
group by
 StationId
 ,AdId
 ,EST_DateTime
;
Load
 iStationID as StationId
 ,AdID as AdId
 ,floor(Date(DateDue)*24) / 24 as EST_DateTime
 ,ImpsPlayed as AdsPlayedAmount
;
SQL SELECT [iStationID]
 ,[DateDue]
 ,[AdID]
 ,[ImpsPlayed]
 FROM [AdPlayList] (nolock) apl
 where apl.DateDue >= '$(vStartDate)'
 and apl.DateDue < '$(vEndDate)';

And I had got next script statistics:

187,7 millions record in result table.
Maximum memory allocation during script run is about 50 Gb.
Execution time is 1901 sec or ~31min.

I thought that maybe it happened because of stacked LOAD statements. I'd decided to split the statements. And I'd made this:


Temp:
Load
 iStationID as StationId
 ,AdID as AdId
 ,floor(Date(DateDue)*24) / 24 as EST_DateTime
 ,ImpsPlayed as AdsPlayedAmount
;
SQL SELECT [iStationID]
 ,[DateDue]
 ,[AdID]
 ,[ImpsPlayed]
 FROM [AdPlayList] (nolock) apl
 where apl.DateDue >= '$(vStartDate)'
 and apl.DateDue < '$(vEndDate)';

AdPlayed:
LOAD
 StationId
 ,AdId
 ,EST_DateTime
 ,0 as Predicted
 ,sum(AdsPlayedAmount) as AdsPlayedAmount
Resident Temp
group by
 StationId
 ,AdId
 ,EST_DateTime
;

DROP Table Temp;

Initial LOAD gave me 329.3 millions of records; 3.5 Gb in memory and 1139 sec ~ 19 min of script run.

After that the second LOAD statement had been started. During grouping part of the statement a memory grew up to 55GB. After aggregation part of the script the memory allocation was about 59,6 Gb.

At the end of the second script after DROP statement I'd got:

187.7 records in result table;
1.7 GB in a memory;
3734 sec ~ 62 min.

As you can see the first (stacked LOADs) is double faster and require less memory allocation. But I'm curious about huge memory allocation for grouping functions, when 3,5 Gb of initial data had been transformed into 59,6 - 3,5 = 56,1 Gb of memory. And after that I'd got just 1,7 Gb of the result table!!!

I can tell that I have about 18 000 of unique StationIds, about 300 000 of unique AdIds and the period is 400 days back ~ 9600 unique EST_DateTime.

The question is - what does QV do with a memory for grouping function? And is it possible to reduce a memory allocation for such statements?

johnw · ‎2010-11-23

I tried sorting the table by the group by field as an intermediate step, and then loading with group by from the second table. I was hoping (though not expecting) that this would convince QlikView that it didn't need to build its internal data structure, since the group by could be easily satisfied by just summing while reading the rows in the existing order. But as expected, it still used the same group by method.

I'm on 9SR6. Are you on version 10 yet? Maybe it's better in version 10? I heard that there were a lot of performance improvements for both script and charts. I should probably just download version 10 onto my own PC in a separate directory and try it out...

Report Inappropriate Content · ‎2010-11-23

I'm on 10 IR

johnw · ‎2010-11-23

OK, so same problem in 10. So it just seems like they're building some internal data structure. The data structure very efficiently processes the group by IF AND ONLY IF there are only a small number of combinations. Whoever coded it then didn't think about what would happen when there were a large number of combinations, and how their clever little hack would turn into a memory-hogging nightmare. That's my guess as to what happened.

This could probably be submitted as a feature request, possibly even as a bug. Group by can be done with much less memory than is actually used. Even if they declare it working as designed, that's a very poor design, and it should be fixed.

How to reduce memory allocation for grouping data load in a script