Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
sanp96
Contributor III
Contributor III

Match on strings represented in multiple ways

Hello everyone, 

I have an Employee table and a CustomerOrders table. Both of these have address as their field, i.e Employee has Employee_address and CustomerOrders has Customer_address.

I am using Mapping to load these tables. If there is a match in the address, the column stores 'Y' else 'N'

My script is as follows:

mapIsEmployee_Addr:
mapping
LOAD
UPPER(Trim("Address1")&'-'&Trim(City)) as Employee_Address,
'Y'
FROM ...//filepath

 

Applymap('mapIsEmployee_Addr',%CustomerAddr,'N') as isEmployee_Addr

 

I thought this was working just fine until I realized that there are some addresses which are the same but weren't matched because some of them are stored as 'lane', 'Street', Boulevard' in one table and as 'Ln', 'St', or 'Strt', 'blvd'or 'bld'  in the other.

I thought of using WildMatch but I'm unsure if that would do the work. Any suggestions as to what could be done?

Any help is highly appreciated.

Thanks!

Labels (6)
2 Replies
Or
MVP
MVP

Standardizing addresses is quite a task - even if you added some replacement wildcards you'd likely only part of it right. Making matters worse, you can't always assume a given set of letters will mean the same thing in every context - for example, 'st." can be "street" or "saint" (or it could be other things, possibly), and then there's the issue of spacing, hyphenation, using a period or not, etc. You can't even trust a fuzzy search here, because two addresses for different locations on the same street are more similar textually than "Boulevard" is to "bld".

There are companies / services / tools that will try and do this for you, and you could probably do this yourself to some extent using Google Maps API or similar approaches, but if your data set isn't too big, you may want to consider fuzzy matching all of the non-identical addresses, returning the nearest match for each, and then manually correcting the data if you find ones that should be a match.

I picked this one out of a Google search - perhaps it will assist you in figuring out the best approach to try here. I'm in no way affiliated with the website in question and I have not used it myself. https://www.placekey.io/blog/how-to-standardize-address-data

 

sanp96
Contributor III
Contributor III
Author

Thanks a lot! I'll check the website you shared.