Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
r3iuk
Creator
Creator

Left Join of 2 tables with a partial string match

Say I have a Product table such as:

Product Quantity
ABC 123 500
ABC 234 1000
WXYZ 5000 100
WXYX 4000a 150

And a Product Type table such as:

ProductType Price
ABC 400
WXYZ 250

The product type is always at the beginning of any product name but could be of a varying length.

What I want to do is create a join of the tables that will give:

Product Price
ABC 123 400
ABC 234 400
WXYZ 5000 250
WXYX 4000a 250

I've tried the following:

Product:
LOAD *
FROM PRODUCT.qvd (qvd);

LEFT JOIN (Product)
LOAD
ProductType
Price
FROM ProductType.qvd (qvd)
WHERE Left(Product, Len(ProductType)) = ProductType;

The left join statement clearly won't work because the Product field is not available in the ProductType table.

Any ideas on how I can achieve what I need to do? I realise that I could do a full cartesian product on both tables then filter out the appropriate rows, but this is going to be too inefficient due to the volumes of data in the 2 tables.

Thanks for any help you can provide.

1 Solution

Accepted Solutions
Anonymous
Not applicable

That's why I said about the "particular example"...

I can think of a solution that should work for you, althgough it may be not the most elegant:
1. In a separate load, join all Products and all Types:
Map1:
LOAD DISCTINT Product FROM ...;
JOIN (Map1) LOAD DISTINCT ProductType FROM ...;

2. See if Product Type string exists in Product string:
Map2:
LOAD
Product,
if index(Product,ProductType)>0, ProductType, null()) as ProductType
RESIDENT Map1;

3. Create Map:
ProductTypeMap:
MAPPING LOAD DISTINCT
Product as A,
ProductType as B
RESIDENT Map2
WHERE len(trim(ProductType)>0;
DROP TABLES Map1, Map2;

4. Get Product Type into your 1st table:
LOAD
*,
applymap('ProductTypeMap', Product) as ProductType
FROM ...

If missed something, you can figure out from here...

View solution in original post

4 Replies
Anonymous
Not applicable

In this particular example, it makes sense to create ProductType field in the 1st table, and join by it:
Product:
LOAD
*,
subfield(Product, ' ', 1) as ProductType
FROM PRODUCT.qvd (qvd);

LEFT JOIN (Product)
LOAD
ProductType
Price
FROM ProductType.qvd (qvd)

r3iuk
Creator
Creator
Author

Hi Michael,

Thanks for the quick reply.

Unfortunately I'm not sure that option is available to me because the product names don't have any common delimiter (sorry my example should've included some).

As well a products with names like 'ABC 123' they can also be 'ABC-123' and there may be other variations. If I could make assumptions on delimiters that would be the way forward but I was hoping that there might a solution that was more general.

Anonymous
Not applicable

That's why I said about the "particular example"...

I can think of a solution that should work for you, althgough it may be not the most elegant:
1. In a separate load, join all Products and all Types:
Map1:
LOAD DISCTINT Product FROM ...;
JOIN (Map1) LOAD DISTINCT ProductType FROM ...;

2. See if Product Type string exists in Product string:
Map2:
LOAD
Product,
if index(Product,ProductType)>0, ProductType, null()) as ProductType
RESIDENT Map1;

3. Create Map:
ProductTypeMap:
MAPPING LOAD DISTINCT
Product as A,
ProductType as B
RESIDENT Map2
WHERE len(trim(ProductType)>0;
DROP TABLES Map1, Map2;

4. Get Product Type into your 1st table:
LOAD
*,
applymap('ProductTypeMap', Product) as ProductType
FROM ...

If missed something, you can figure out from here...

r3iuk
Creator
Creator
Author

Thanks Michael, that works well, performance is OK too.

It just needed a NOCONCATENATE on the load in step 2 and a couple of brackets here and there.