Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I need to run a search to see if the name Registered is same as Company name using levenshtein.
Rules to be followed while searching text:
1. Excluding expressions like Pvt | Private| Ltd| Limited| &| And
Or there is any other way to find difference between the two.
Please advise
Thanks
Shefali
there is function there in Qlik.
Data:
LOAD *, LevenshteinDist(Registered_Name_String_clean,Company_Name_String_clean) as Levenshtein_Distances ;
LOAD *,
trim(purgechar(replace(replace(replace(replace(replace(replace(lower(Registered_Name),'pvt',''),'private',''),'ltd',''),'limited',''),' and ',' '),' & ',' '),'.')) as Registered_Name_String_clean,
trim(purgechar(replace(replace(replace(replace(replace(replace(lower(Company_Name),'pvt',''),'private',''),'ltd',''),'limited',''),' and ',' '),' & ',' '),'.')) as Company_Name_String_clean;
Load * inline [
Company_Name,Registered_Name
Ajanta Creations Pvt Ltd,Ajanta Creations Private Limited
Sanjay Handlooms, Sanjay Handlooms
Heena Pharmaceutical Limited,Heena Pharmaceutical Ltd
Chatterjee Company,Chatterjee
Om Prakash Medical Store, Om Prakash Store
Raj Kumar Company private, Raj Company
Anjum Creations Private,Anjum Creations Pvt.,
Raj and Sons Company, Raj & Sons Company ];
Hi @Shefali30
To use levenshtein you will need to connect to a Python server, which may be overkill for what you are after.
What you will want to do is the same set of replace / capitalize statements on both sides and then do a compare.
To avoid duplication I would use a variable which take s a parameter, something like this:
set vClean = replace(replace(replace(UPPER([$1]), 'PVT', ''), 'PRIVATE', ''), 'COMPANY', '');
LOAD
Field1,
Field2,
if($(vClean(Field1)) = $(vClean(Field2)), 'Yes', 'No') as Match,
...
Obviously the number of replaces in the variable would be a lot more than what are there, but hopefully you get the idea.
Paramterised variables are explained in more details here:
https://www.quickintelligence.co.uk/variables-parameters-load-script/
Steve
Does it not 8 for "Om Prakash Store" and 6 for "Raj Kumar Company"?
there is function there in Qlik.
Data:
LOAD *, LevenshteinDist(Registered_Name_String_clean,Company_Name_String_clean) as Levenshtein_Distances ;
LOAD *,
trim(purgechar(replace(replace(replace(replace(replace(replace(lower(Registered_Name),'pvt',''),'private',''),'ltd',''),'limited',''),' and ',' '),' & ',' '),'.')) as Registered_Name_String_clean,
trim(purgechar(replace(replace(replace(replace(replace(replace(lower(Company_Name),'pvt',''),'private',''),'ltd',''),'limited',''),' and ',' '),' & ',' '),'.')) as Company_Name_String_clean;
Load * inline [
Company_Name,Registered_Name
Ajanta Creations Pvt Ltd,Ajanta Creations Private Limited
Sanjay Handlooms, Sanjay Handlooms
Heena Pharmaceutical Limited,Heena Pharmaceutical Ltd
Chatterjee Company,Chatterjee
Om Prakash Medical Store, Om Prakash Store
Raj Kumar Company private, Raj Company
Anjum Creations Private,Anjum Creations Pvt.,
Raj and Sons Company, Raj & Sons Company ];
Well done. Kush!
Just discovered the LevenshteinDist() function, it's always good to follow Qlik Community posts.
Strange that this function is still undocumented for three years now...
@swuehl Thank you 😊 Actually Kudos to you that you had already discovered this way back which I just referred .
Even I ignored searching on community due to this uncanny function and started building my own logic using complex loops. I achieved the logic and even posted here. But later I discovered your post and found the function then I deleted that post.😅
Lesson learned- always search on community first.
Thanks. It worked😊