Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
rbecher
MVP
MVP

string matching with fuzzy, trigram (n-gram), levenshtein, etc.

Hi,

I'm looking for a possibility for string matching with fuzzy(-search), trigram (n-gram), levenshtein, etc. in QV script.

Any suggestions?

Ralf

Astrato.io Head of R&D
26 Replies
rbecher
MVP
MVP
Author

I just copied and tested the code, not much work, though.

Please join the group Data Quality

- Ralf

Astrato.io Head of R&D
MarcoWedel

See also:

Levenshtein Algorithm

Regards

Marco

rbecher
MVP
MVP
Author

So, you've found the same VBScript function.

I think trigram comparison makes more sense to score and find duplicates. Maybe I post a solution later..

Astrato.io Head of R&D
Not applicable

Hi Ralf,

do you already have a solution with trigram comparison or something else?

I try to compare about 5000 address data.

The Levenshtein solution works but it takes too long.

Thanks for your help!

Regards

Dominik

rbecher
MVP
MVP
Author

Hi Dominik,

yes I have, especially for this use case finding doublets in address data..

- Ralf

Astrato.io Head of R&D
Not applicable

Hey Ralph,

I`m very interested in your solution of the trigram comparison.

I have a huge dataset with firstName and lastName and i want to make a similarity check to find doublets.

May I ask you to share your trigram solution?

Steve

rbecher
MVP
MVP
Author

Hi Steve,

unfortunately I can't because it's a commercial solution..

- Ralf

Astrato.io Head of R&D