Skip to main content
Announcements
NEW: Seamless Public Data Sharing with Qlik's New Anonymous Access Capability: TELL ME MORE!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Fuzzy/Close Name Match

Hi,

I am not able to get to a very important concept of fuzzy/close match in QV. Not sure if it has the capability of that.

As an example here is what I am looking to do. I have 2 name fields in 2 different files. Employee Name in Employee file and Vendor Name in Vendor file. I need to run a search to see if any of the employees are acting as vendors, this can't be done using exact join as there could be a small variation in the name, usually a letter or two.

Like matching Lynne to Lynn, John Smithe to Jon Smith.

Please advise

Thanks

Sunny

23 Replies
Anonymous
Not applicable
Author

Hi Josh, I am getting the following msg at that link and when I go to qlikviewcookbook.com, I see a lot of enteries but not the one we want

All downloads are now served from the Tools and Recipes sections at:

QlikviewCookbook.com

Go there to access downloads and updated content

  This directory does not yet contain any files.

arieidel
Partner - Creator II
Partner - Creator II

Hello Sunny,

You can find that mapping example in the Recipe section. Direct link to download the example: http://qlikviewcookbook.com/download/mapping-substrings-2/

Anyway, I have the same need as yours but this example doesn't help me. I think it's easier when the fuzzy search uses single words but when the string to be search is a sentence -like a name and a lastname- there isn't a straightforward solution...

But if you find it, please share!

Regards,

Ariel

MarcoWedel

Hi,

one solution to match similar names could be to calculate the levenshtein distance (Levenshtein Algorithm) and apply some treshold to it:

QlikCommunity_Thread_214856_Pic1.JPG

tabEmployees:

LOAD * FROM [https://community.qlik.com/servlet/JiveServlet/download/1029287-223516/Employee%20file.xlsx] (ooxml, embedded labels, table is Sheet1);

tabVendors:

LOAD * FROM [https://community.qlik.com/servlet/JiveServlet/download/1029287-223515/vendors.xlsx] (ooxml, embedded labels, table is Sheet1);

tabLink:

LOAD Distinct [Emp Name] Resident tabEmployees;

Join

LOAD Distinct [vendor name] Resident tabVendors;

Join

LOAD [Emp Name], [vendor name], levenshtein([Emp Name], [vendor name]) as LevDist Resident tabLink;

Right Join

LOAD Distinct LevDist Resident tabLink Where LevDist<=2;

hope this helps

regards

Marco

Anonymous
Not applicable
Author

Thanks Marco,

How are you getting the function levenshtein in the script? I tried but this function does not exist.

settu_periasamy
Master III
Master III

Hi,

levenshtein  is a User Defined Function in the Macro Editor. Click ( Ctrl + M ), you can find it..


Levenshtein Algorithm

Capture.JPG


Anonymous
Not applicable
Author

Thanks settu, can you please attach the qvw, I don't have that

settu_periasamy
Master III
Master III

I just showed it from marcowedel‌attachment. You can download it.. Or You can just the mentioned link.

Anonymous
Not applicable
Author

Thanks I missed that

Anonymous
Not applicable
Author

Thanks Marco

Anonymous
Not applicable
Author

Hi Marco,

I just added a few non matching exceptions also in the file and those came too in the table box. Any way we could avoid that to appear in the table box.

   

Emp noEmp Namevendor namevendor num
12dellde2
999genpact
111John SmitheJon Smith88888
122LynneLynn98878
233Mike PepperMike Peper99876
gen1
o3
p4
q5
r6
s7