Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have two sets of customer name data, one is reliable, the other is not, for example
Correct name: ACE Construction and Demolition Limited
Variations in user input
ACE Construction
ACE Demolition
A.C.E. Contrction LTD
I want to be able to create a 'best-fit' matching application - I'm thinking of resequencing the strings, as in
aacccdddeeeiiiiillmmnnnnoooorsttttu |
ACE Construction would then have a match coefficient of 15 out of 35; removing Limited and LTD etc would give a match of 15/28
or 54%.
Any ideas?
Best regards,
Marty.
This Document sprang to mind: http://community.qlik.com/docs/DOC-7051 it's not exactly the same but it seems to do a similar thing.
Hi Martyn,
you can try Levenshtein distance algorithm:
http://community.qlik.com/message/517405#517405
- Ralf