Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Not sure if this has been covered before in a discussion, sorry if it was.
While I was doing a few high volume tests I ran into what I believe might be an exists clause glitch and I wanted to share it hoping someone at QT will pick it up or at least It might be useful to someone.
I'm using QV9 SR4 (not the latest, I know) and a version of the 10 Million rows application that has 50 Million.
1. First I did a simple load using a Where clause, it took 58 secs to load 945.501 rows.
LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS
FROM R00.QVD (qvd)
where ProductID = 40;
drop table R00;
2. Then I thought a where exists clause would be much faster, since the load would be optimized, and it was. It took only 4 secs to load the same information as the code above
load 40 as ProductID autogenerate(1);
LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS
FROM R00.QVD (qvd)
where Exists(ProductID);
drop table R00;
So far nothing out of the ordinary.
3. Then I added a transformation, this way neither the simple where nor the where exists clauses will have the benefit of the optimized load.
I only added "Quantity*UnitPrice - Discount as NetSales". The load took 0.59 secs
LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS, Quantity*UnitPrice - Discount as NetSales
FROM R00.QVD (qvd)
where ProductID = 40;
drop table R00;
4. Same thing, I added
load 40 as ProductID autogenerate(1);
LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS,
Quantity*UnitPrice - Discount as NetSale
FROM R00.QVD (qvd)
where Exists(ProductID);
drop table R00;
5. I tried doing it in two steps, first the optimized where exists load, and then the transformation. It took 16 secs
load 40 as ProductID autogenerate(1);
R00_Temp:
LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS
FROM R00.QVD (qvd)
where Exists(ProductID);
R00:
LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS
, Quantity*UnitPrice - Discount as NetSale
RESIDENT R00_Temp;
drop table R00_Temp;
drop table R00;
Hopefully some else might have a view on this.
The only thing that seems unusual to me in your list is the 3 minutes, 38 seconds for #4. While the calculation will break the optimized load, I wouldn't expect it to be any slower than the where ProductID = 40 version.
And testing with a QVD of my own does not duplicate what you're seeing. I got 24 seconds with the where exists, 25 seconds with the where =. I'm on SR5, but it seems unlikely that they would coincidentally have fixed this between these two very similar versions.
In any case, I'd never code your version #4 causing the problem. Where = is much easier, so unless you ARE going to get an optimized load, I'd be using the where = version. So it doesn't seem like a practical problem to me.
In case it makes any difference, I code this sort of thing like this to avoid the temp table:
R00:
load 40 as ProductID
autogenerate(1);
INNER JOIN (R00)
LOAD ShipperID,
OrderDate,
CustomerID,
Discount,
ProductID,
Quantity,
UnitPrice,
sales,
COS
FROM R00.QVD (qvd)
where exists(ProductID);
LEFT JOIN (R00)
LOAD *, Quantity*UnitPrice - Discount as NetSales
RESIDENT R00;
Thanks for taking a look at this.
This wasn't the actual problem, just a volume test. It showed, at least in this example, that Exists is way faster when not using any transformations but slower when using them which is totally contra intuitive imo.