Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Why is Count() from a QVD file so Slow?

I have a very large table (77 mil rows) stored as a QVD file, and I want to load some aggregated counts from this file for a particular application.

I have found when I first load the table and then perform the count from the resulting resident table, it takes only 44 seconds total, which is quite decent:

Data:

donation_alerts:

LOAD  date(floor(date)) as date,

    version

FROM ..\Data\QVDs\data1.qvd (qvd); (19 seconds)

qualify *;

Summary:

LOAD date,

  version,

  count(version)

Resident donation_alerts

group by date, version; (+25 seconds = 44 seconds total)

However, when I try the same in a single load statement, it runs much slower, taking 109 seconds:

LOAD date(floor(date)),

  version,

  count(version)

FROM ..\Data\QVDs\data1.qvd (qvd)

group by date(floor(date)), version; (109 seconds)

Or the intermediate option, which takes even longer at 126 seconds:

LOAD date,

  version,

  count(version)

group by date, version;

LOAD  date(floor(date)) as date

    version

FROM ..\Data\QVDs\data1.qvd (qvd); (126 seconds)

I can live with the workaround but to me the second solution is cleanest and more logical, and is the method I naturally had tried first. I would like to understand why the second and third methods are so much slower so I can apply this to other cases to speed up loads. I am not not sure why the third option performs any differently from the first option.

Could the date transformation I'm doing have anything to do with it? Is it not performing an optimised load in one of the cases?

1 Solution

Accepted Solutions
sushil353
Master II
Master II

Hi,

In the first option you have optimized QVD load which is faster and then you are calculation the count on data which is in memory.. so that is why you got lesser time..

But in second case QVD load is unoptimized.. due to which you are getting more time to execute.

HTH

sushil

View solution in original post

4 Replies
sushil353
Master II
Master II

Hi,

In the first option you have optimized QVD load which is faster and then you are calculation the count on data which is in memory.. so that is why you got lesser time..

But in second case QVD load is unoptimized.. due to which you are getting more time to execute.

HTH

sushil

Not applicable
Author

Thanks, I wondered if that was it, so it's still optimised even though I'm performing the data(floor(date)) transformation which is evaluated for every single row?

Is there any good documentation on optimised loads so I can get a better understand of them?

swuehl
MVP
MVP

Well, I think its non optimized. What does the document log says?

Have a look at

Turning Unoptimized Loads into Optimized Loads

and the referenced blog.

sushil353
Master II
Master II