Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a very large table (77 mil rows) stored as a QVD file, and I want to load some aggregated counts from this file for a particular application.
I have found when I first load the table and then perform the count from the resulting resident table, it takes only 44 seconds total, which is quite decent:
Data:
donation_alerts:
LOAD date(floor(date)) as date,
version
FROM ..\Data\QVDs\data1.qvd (qvd); (19 seconds)
qualify *;
Summary:
LOAD date,
version,
count(version)
Resident donation_alerts
group by date, version; (+25 seconds = 44 seconds total)
However, when I try the same in a single load statement, it runs much slower, taking 109 seconds:
LOAD date(floor(date)),
version,
count(version)
FROM ..\Data\QVDs\data1.qvd (qvd)
group by date(floor(date)), version; (109 seconds)
Or the intermediate option, which takes even longer at 126 seconds:
LOAD date,
version,
count(version)
group by date, version;
LOAD date(floor(date)) as date
version
FROM ..\Data\QVDs\data1.qvd (qvd); (126 seconds)
I can live with the workaround but to me the second solution is cleanest and more logical, and is the method I naturally had tried first. I would like to understand why the second and third methods are so much slower so I can apply this to other cases to speed up loads. I am not not sure why the third option performs any differently from the first option.
Could the date transformation I'm doing have anything to do with it? Is it not performing an optimised load in one of the cases?
Hi,
In the first option you have optimized QVD load which is faster and then you are calculation the count on data which is in memory.. so that is why you got lesser time..
But in second case QVD load is unoptimized.. due to which you are getting more time to execute.
HTH
sushil
Hi,
In the first option you have optimized QVD load which is faster and then you are calculation the count on data which is in memory.. so that is why you got lesser time..
But in second case QVD load is unoptimized.. due to which you are getting more time to execute.
HTH
sushil
Thanks, I wondered if that was it, so it's still optimised even though I'm performing the data(floor(date)) transformation which is evaluated for every single row?
Is there any good documentation on optimised loads so I can get a better understand of them?
Well, I think its non optimized. What does the document log says?
Have a look at
Turning Unoptimized Loads into Optimized Loads
and the referenced blog.
Check out this thread: http://community.qlik.com/thread/46072