2 Replies Latest reply: Aug 6, 2010 6:20 PM by Daniel Rozental RSS

    Where vs Exists

    Daniel Rozental

      Not sure if this has been covered before in a discussion, sorry if it was.

      While I was doing a few high volume tests I ran into what I believe might be an exists clause glitch and I wanted to share it hoping someone at QT will pick it up or at least It might be useful to someone.

      I'm using QV9 SR4 (not the latest, I know) and a version of the 10 Million rows application that has 50 Million.

      1. First I did a simple load using a Where clause, it took 58 secs to load 945.501 rows.

      LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS
      FROM R00.QVD (qvd)
      where ProductID = 40;
      drop table R00;

      2. Then I thought a where exists clause would be much faster, since the load would be optimized, and it was. It took only 4 secs to load the same information as the code above

      load 40 as ProductID autogenerate(1);

      LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS
      FROM R00.QVD (qvd)
      where Exists(ProductID);
      drop table R00;

      So far nothing out of the ordinary.

      3. Then I added a transformation, this way neither the simple where nor the where exists clauses will have the benefit of the optimized load.

      I only added "Quantity*UnitPrice - Discount as NetSales". The load took 0.59 secs

      LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS, Quantity*UnitPrice - Discount as NetSales
      FROM R00.QVD (qvd)
      where ProductID = 40;
      drop table R00;

      4. Same thing, I added

      "Quantity*UnitPrice - Discount as NetSales" to the where exists load at it took 3 minutes, 38 seconds !!!

      load 40 as ProductID autogenerate(1);

      LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS,

      Quantity*UnitPrice - Discount as NetSale



      FROM R00.QVD (qvd)
      where Exists(ProductID);
      drop table R00;

      5. I tried doing it in two steps, first the optimized where exists load, and then the transformation. It took 16 secs

      load 40 as ProductID autogenerate(1);

      R00_Temp:

      LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS

      FROM R00.QVD (qvd)
      where Exists(ProductID);

      R00:

      LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS

      , Quantity*UnitPrice - Discount as NetSale



      RESIDENT R00_Temp;

      drop table R00_Temp;

      drop table R00;

       

      Hopefully some else might have a view on this.

        • Where vs Exists
          John Witherspoon

          The only thing that seems unusual to me in your list is the 3 minutes, 38 seconds for #4. While the calculation will break the optimized load, I wouldn't expect it to be any slower than the where ProductID = 40 version.

          And testing with a QVD of my own does not duplicate what you're seeing. I got 24 seconds with the where exists, 25 seconds with the where =. I'm on SR5, but it seems unlikely that they would coincidentally have fixed this between these two very similar versions.

          In any case, I'd never code your version #4 causing the problem. Where = is much easier, so unless you ARE going to get an optimized load, I'd be using the where = version. So it doesn't seem like a practical problem to me.

          In case it makes any difference, I code this sort of thing like this to avoid the temp table:

          R00:
          load 40 as ProductID
          autogenerate(1);

          INNER JOIN (R00)
          LOAD ShipperID,
          OrderDate,
          CustomerID,
          Discount,
          ProductID,
          Quantity,
          UnitPrice,
          sales,
          COS
          FROM R00.QVD (qvd)
          where exists(ProductID);

          LEFT JOIN (R00)
          LOAD *, Quantity*UnitPrice - Discount as NetSales
          RESIDENT R00;

            • Where vs Exists
              Daniel Rozental

              Thanks for taking a look at this.

               

              This wasn't the actual problem, just a volume test. It showed, at least in this example, that Exists is way faster when not using any transformations but slower when using them which is totally contra intuitive imo.