Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
gagewhite
Contributor III
Contributor III

Which is more efficient - INNER JOIN or WHERE

A question was brought up during our SCRUM call this morning, and no one could give an answer. Lets say you have two tables:

Approved Products:

Product_Number
123abc
456def

 

All Products:

Product_Number Product_Name
123abc Shoe
456def Shirt
789ghi Pants
101112jkl Belt

 

And you want the resulting table to be the All Products table, but only with the Approved Products in it.

Result:

Product_Number Product_Name
123abc Shoe
456def Shirt

 

The two solutions we were debating between are:

Solution #1: LOAD * RESIDENT All Products INNER JOIN LOAD Product_Number RESIDENT Approved Products

or

Solution #2: First create a comma separated list of the values from Approved Products, vList. Then use that value in the where clause - i.e. LOAD * RESIDENT All Products WHERE Product_Number MATCH(Product_Number, $(vList))

Both of these solutions will get me the same result, but I am curious as to which of these are the optimal solution. For some reason, joining out to a table seems like it would require more resources, but that is literally just a feeling I have. Huge bonus points if you even know why one is more optimal than the other.

Kind regards,

Gage

Labels (3)
1 Reply
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

In my opinion, #2 would generally be the wrong way to go about it. The "correct" equivalent for #2 would be to load Approved Products first and then load  All Products using  "Where Exists(Product_Number)".

TempProducts:
LOAD Product_Number
From Approved_Products ...;

All_Products:
LOAD * 
From All_Products.qvd (qvd)
Where Exists (Product_Number);

Drop Table TempProducts;

This is especially effective if All_Products is loaded from a QVD as Where Exists() will be an optimized load.  That is the most resource efficient solution because where exists is fast and Qlik will use the least memory by filtering as it builds the table. 

If you must filter All_Products from a resident table, using Inner Join is the best approach. 

Inner Join (All_Products)
Load Product_Number
From ...;

Memory for two tables will exist for the period of the Join, but Qlik will automatically take care of dropping the old table. 

The "Where Match()" approach would also allocate memory for two tables, and additionally it would be the most compute intensive of the options. 

-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com