4 Replies Latest reply: May 2, 2014 6:59 AM by Sushil Kumar RSS

    Why is Count() from a QVD file so Slow?

    Mike Swinn

      I have a very large table (77 mil rows) stored as a QVD file, and I want to load some aggregated counts from this file for a particular application.

       

      I have found when I first load the table and then perform the count from the resulting resident table, it takes only 44 seconds total, which is quite decent:

       

      Data:

      donation_alerts:

      LOAD  date(floor(date)) as date,

          version

      FROM ..\Data\QVDs\data1.qvd (qvd); (19 seconds)

       

      qualify *;

       

      Summary:

      LOAD date,

        version,

        count(version)

      Resident donation_alerts

      group by date, version; (+25 seconds = 44 seconds total)

       

       

      However, when I try the same in a single load statement, it runs much slower, taking 109 seconds:

       

      LOAD date(floor(date)),

        version,

        count(version)

      FROM ..\Data\QVDs\data1.qvd (qvd)

      group by date(floor(date)), version; (109 seconds)

       

       

      Or the intermediate option, which takes even longer at 126 seconds:

       

      LOAD date,

        version,

        count(version)

      group by date, version;

      LOAD  date(floor(date)) as date

          version

      FROM ..\Data\QVDs\data1.qvd (qvd); (126 seconds)

       

      I can live with the workaround but to me the second solution is cleanest and more logical, and is the method I naturally had tried first. I would like to understand why the second and third methods are so much slower so I can apply this to other cases to speed up loads. I am not not sure why the third option performs any differently from the first option.

       

      Could the date transformation I'm doing have anything to do with it? Is it not performing an optimised load in one of the cases?