Qlik Community

QlikView Scripting

Discussion Board for collaboration on QlikView Scripting.

Announcements
QlikView Fans! We’d love to hear from you.
Share your QlikView feedback with the product team… Click here to participate in our 5-minute survey.
Rules, plus terms and conditions, can be found here.
danielrozental
Honored Contributor II

Where vs Exists

Not sure if this has been covered before in a discussion, sorry if it was.

While I was doing a few high volume tests I ran into what I believe might be an exists clause glitch and I wanted to share it hoping someone at QT will pick it up or at least It might be useful to someone.

I'm using QV9 SR4 (not the latest, I know) and a version of the 10 Million rows application that has 50 Million.

1. First I did a simple load using a Where clause, it took 58 secs to load 945.501 rows.

LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS
FROM R00.QVD (qvd)
where ProductID = 40;
drop table R00;

2. Then I thought a where exists clause would be much faster, since the load would be optimized, and it was. It took only 4 secs to load the same information as the code above

load 40 as ProductID autogenerate(1);

LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS
FROM R00.QVD (qvd)
where Exists(ProductID);
drop table R00;

So far nothing out of the ordinary.

3. Then I added a transformation, this way neither the simple where nor the where exists clauses will have the benefit of the optimized load.

I only added "Quantity*UnitPrice - Discount as NetSales". The load took 0.59 secs

LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS, Quantity*UnitPrice - Discount as NetSales
FROM R00.QVD (qvd)
where ProductID = 40;
drop table R00;

4. Same thing, I added

"Quantity*UnitPrice - Discount as NetSales" to the where exists load at it took 3 minutes, 38 seconds !!!

load 40 as ProductID autogenerate(1);

LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS,

Quantity*UnitPrice - Discount as NetSale



FROM R00.QVD (qvd)
where Exists(ProductID);
drop table R00;

5. I tried doing it in two steps, first the optimized where exists load, and then the transformation. It took 16 secs

load 40 as ProductID autogenerate(1);

R00_Temp:

LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS

FROM R00.QVD (qvd)
where Exists(ProductID);

R00:

LOAD ShipperID, OrderDate, CustomerID, Discount, ProductID, Quantity, UnitPrice, sales, COS

, Quantity*UnitPrice - Discount as NetSale



RESIDENT R00_Temp;

drop table R00_Temp;

drop table R00;

Hopefully some else might have a view on this.

2 Replies
MVP
MVP

Where vs Exists

The only thing that seems unusual to me in your list is the 3 minutes, 38 seconds for #4. While the calculation will break the optimized load, I wouldn't expect it to be any slower than the where ProductID = 40 version.

And testing with a QVD of my own does not duplicate what you're seeing. I got 24 seconds with the where exists, 25 seconds with the where =. I'm on SR5, but it seems unlikely that they would coincidentally have fixed this between these two very similar versions.

In any case, I'd never code your version #4 causing the problem. Where = is much easier, so unless you ARE going to get an optimized load, I'd be using the where = version. So it doesn't seem like a practical problem to me.

In case it makes any difference, I code this sort of thing like this to avoid the temp table:

R00:
load 40 as ProductID
autogenerate(1);

INNER JOIN (R00)
LOAD ShipperID,
OrderDate,
CustomerID,
Discount,
ProductID,
Quantity,
UnitPrice,
sales,
COS
FROM R00.QVD (qvd)
where exists(ProductID);

LEFT JOIN (R00)
LOAD *, Quantity*UnitPrice - Discount as NetSales
RESIDENT R00;

danielrozental
Honored Contributor II

Where vs Exists

Thanks for taking a look at this.

This wasn't the actual problem, just a volume test. It showed, at least in this example, that Exists is way faster when not using any transformations but slower when using them which is totally contra intuitive imo.

Community Browser