Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
danielrozental
Master II
Master II

Where vs Exists

Not sure if this has been covered before in a discussion, sorry if it was.

While I was doing a few high volume tests I ran into what I believe might be an exists clause glitch and I wanted to share it hoping someone at QT will pick it up or at least It might be useful to someone.

I'm using QV9 SR4 (not the latest, I know) and a version of the 10 Million rows application that has 50 Million.

1. First I did a simple load using a Where clause, it took 58 secs to load 945.501 rows.

LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS
FROM R00.QVD (qvd)
where ProductID = 40;
drop table R00;

2. Then I thought a where exists clause would be much faster, since the load would be optimized, and it was. It took only 4 secs to load the same information as the code above

load 40 as ProductID autogenerate(1);

LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS
FROM R00.QVD (qvd)
where Exists(ProductID);
drop table R00;

So far nothing out of the ordinary.

3. Then I added a transformation, this way neither the simple where nor the where exists clauses will have the benefit of the optimized load.

I only added "Quantity*UnitPrice - Discount as NetSales". The load took 0.59 secs

LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS, Quantity*UnitPrice - Discount as NetSales
FROM R00.QVD (qvd)
where ProductID = 40;
drop table R00;

4. Same thing, I added

"Quantity*UnitPrice - Discount as NetSales" to the where exists load at it took 3 minutes, 38 seconds !!!

load 40 as ProductID autogenerate(1);

LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS,

Quantity*UnitPrice - Discount as NetSale



FROM R00.QVD (qvd)
where Exists(ProductID);
drop table R00;

5. I tried doing it in two steps, first the optimized where exists load, and then the transformation. It took 16 secs

load 40 as ProductID autogenerate(1);

R00_Temp:

LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS

FROM R00.QVD (qvd)
where Exists(ProductID);

R00:

LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS

, Quantity*UnitPrice - Discount as NetSale



RESIDENT R00_Temp;

drop table R00_Temp;

drop table R00;

Hopefully some else might have a view on this.

2 Replies
johnw
Champion III
Champion III

The only thing that seems unusual to me in your list is the 3 minutes, 38 seconds for #4. While the calculation will break the optimized load, I wouldn't expect it to be any slower than the where ProductID = 40 version.

And testing with a QVD of my own does not duplicate what you're seeing. I got 24 seconds with the where exists, 25 seconds with the where =. I'm on SR5, but it seems unlikely that they would coincidentally have fixed this between these two very similar versions.

In any case, I'd never code your version #4 causing the problem. Where = is much easier, so unless you ARE going to get an optimized load, I'd be using the where = version. So it doesn't seem like a practical problem to me.

In case it makes any difference, I code this sort of thing like this to avoid the temp table:

R00:
load 40 as ProductID
autogenerate(1);

INNER JOIN (R00)
LOAD ShipperID,
OrderDate,
CustomerID,
Discount,
ProductID,
Quantity,
UnitPrice,
sales,
COS
FROM R00.QVD (qvd)
where exists(ProductID);

LEFT JOIN (R00)
LOAD *, Quantity*UnitPrice - Discount as NetSales
RESIDENT R00;

danielrozental
Master II
Master II
Author

Thanks for taking a look at this.

This wasn't the actual problem, just a volume test. It showed, at least in this example, that Exists is way faster when not using any transformations but slower when using them which is totally contra intuitive imo.