
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Fuzzy/Close Name Match
Hi,
I am not able to get to a very important concept of fuzzy/close match in QV. Not sure if it has the capability of that.
As an example here is what I am looking to do. I have 2 name fields in 2 different files. Employee Name in Employee file and Vendor Name in Vendor file. I need to run a search to see if any of the employees are acting as vendors, this can't be done using exact join as there could be a small variation in the name, usually a letter or two.
Like matching Lynne to Lynn, John Smithe to Jon Smith.
Please advise
Thanks
Sunny
- « Previous Replies
- Next Replies »
Accepted Solutions


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
one solution to match similar names could be to calculate the levenshtein distance (Levenshtein Algorithm) and apply some treshold to it:
tabEmployees:
LOAD * FROM [https://community.qlik.com/servlet/JiveServlet/download/1029287-223516/Employee%20file.xlsx] (ooxml, embedded labels, table is Sheet1);
tabVendors:
LOAD * FROM [https://community.qlik.com/servlet/JiveServlet/download/1029287-223515/vendors.xlsx] (ooxml, embedded labels, table is Sheet1);
tabLink:
LOAD Distinct [Emp Name] Resident tabEmployees;
Join
LOAD Distinct [vendor name] Resident tabVendors;
Join
LOAD [Emp Name], [vendor name], levenshtein([Emp Name], [vendor name]) as LevDist Resident tabLink;
Right Join
LOAD Distinct LevDist Resident tabLink Where LevDist<=2;
hope this helps
regards
Marco


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
pls post a sample in excel.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please go through below link,


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Useful might be here Please help me get the string pattern of each field. the answers from Marco.
- Marcus

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
attached

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
attached sample files for reference


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi,
check the attachment.
Steps:
Try following below steps:
1) LOAD the tables.
2) Create a Straight table, with empnamae and vendorname as dimensions.
3) use below expression
=WildMatch([Emp Name],mid([vendor name],1,1)&'*'& mid([vendor name],2,1)&'*' & mid([vendor name],3,1)&'*')
Regards,
Joshua.


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This may help.
Rob Wunderlich's cookbook has an example doing a fuzzy mapping.
http://robwunderlich.com/Download.html

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Joshua, it works however that example works well for a sample data of a few records. How can I do that if I have thousands of rows to match with each other.


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Hope it will work fine for 1000 records also,
if possible you share me 1000 records sample,i will try and let you know.
Regards,
Joshua.

- « Previous Replies
- Next Replies »