I have a very large table (77 mil rows) stored as a QVD file, and I want to load some aggregated counts from this file for a particular application.
I have found when I first load the table and then perform the count from the resulting resident table, it takes only 44 seconds total, which is quite decent:
LOAD date(floor(date)) as date,
FROM ..\Data\QVDs\data1.qvd (qvd); (19 seconds)
group by date, version; (+25 seconds = 44 seconds total)
However, when I try the same in a single load statement, it runs much slower, taking 109 seconds:
FROM ..\Data\QVDs\data1.qvd (qvd)
group by date(floor(date)), version; (109 seconds)
Or the intermediate option, which takes even longer at 126 seconds:
group by date, version;
LOAD date(floor(date)) as date
FROM ..\Data\QVDs\data1.qvd (qvd); (126 seconds)
I can live with the workaround but to me the second solution is cleanest and more logical, and is the method I naturally had tried first. I would like to understand why the second and third methods are so much slower so I can apply this to other cases to speed up loads. I am not not sure why the third option performs any differently from the first option.
Could the date transformation I'm doing have anything to do with it? Is it not performing an optimised load in one of the cases?