this is a small document about performance issues when looking for min and max values in very large tables.
Lets imagine that we have a fact table called myData with more than 100 M rows and we need to obtain the min and max value for a date field called DateKey.
1-In my first try i did as follows (traditional way):
NOCONCATENATE LOAD DISTINCT
MIN([DateKey]) AS MinDate,
MAX([DateKey]) AS MaxDate
RESIDENT myData WHERE ([DateKey] <> NULL());
This took 15 minutes, only for doing this!
In order to achieve my goal i had to change my strategy.
2-After a few tries i came with this:
LOAD MIN(DateAux ) AS MinDate, MAX(DateAux ) AS MaxDate;
LOAD FIELDVALUE('DateKey',ITERNO()) AS DateAux
WHILE NOT ISNULL(FIELDVALUE('DateKey',ITERNO()));
With this change the reload time got decreased a lot and this only took 10 seconds!
1-In my first try i was looking for min and max values of a specific field in the context of the table it belongs to, this makes Qlikview to first determine which are the distinct values of this field in this table and then determine with are the min and max (it will look in all 100 millions rows). Adding or removing the distinct clause didn't make any difference.
2-First lets remember that Qlikview stores for each field only the distinct values and the in the tables there are bit stuffed pointers pointing to those values, so using a combination of fieldvalue function, autogenerate and a while loop we will be looping only through those distinct values of the field (10K only instead of 100M) and obtaining the min and max values of the field a lot faster than using the traditional way (1).
I've always solved this kind of requirements with the traditional way and never had a time performance issue, so for small to medium tables it is a good solution. It's good to know that Qlikview has great alternatives for very large data sets also.
This are the things that make me love QV!
Always discovering new things and better ways to them!
Hope this helps you guys.