Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
rbecher
MVP
MVP

string matching with fuzzy, trigram (n-gram), levenshtein, etc.

Hi,

I'm looking for a possibility for string matching with fuzzy(-search), trigram (n-gram), levenshtein, etc. in QV script.

Any suggestions?

Ralf

Astrato.io Head of R&D
1 Solution

Accepted Solutions
rbecher
MVP
MVP
Author

Hi Karen,

I found a workable VBScript implementation as a function. This can be used during LOAD on record level. So you would need to join the source data first:

LOAD Script:

Levenshtein:

LOAD F1, F2, levenshtein(F1,F2) as distance;

LOAD * INLINE [

    F1, F2

    Qlik, Qlik ltd

    Qlik ltd, Qlik Limited

    Qlik Limited, QlikTech

    Qlik, Klik

];

Module:

' Source:

' http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#VBScript

Function levenshtein( a, b )

    Dim i,j,cost,d,min1,min2,min3

' Avoid calculations where there there are empty words

    If Len( a ) = 0 Then levenshtein = Len( b 😞 Exit Function

    If Len( b ) = 0 Then levenshtein = Len( a 😞 Exit Function

' Array initialization   

    ReDim d( Len( a ), Len( b ) )

    For i = 0 To Len( a 😞 d( i, 0 ) = i: Next

    For j = 0 To Len( b 😞 d( 0, j ) = j: Next

' Actual calculation

    For i = 1 To Len( a )

        For j = 1 To Len( b )

                        If Mid(a, i, 1) = Mid(b, j, 1) Then cost = 0 Else cost = 1 End If

            ' Since min() function is not a part of VBScript, we'll "emulate" it below

            min1 = ( d( i - 1, j ) + 1 )

            min2 = ( d( i, j - 1 ) + 1 )

            min3 = ( d( i - 1, j - 1 ) + cost )

            If min1 <= min2 And min1 <= min3 Then

                d( i, j ) = min1

            ElseIf min2 <= min1 And min2 <= min3 Then

                d( i, j ) = min2

            Else

                d( i, j ) = min3

            End If

        Next

    Next

    levenshtein = d( Len( a ), Len( b ) )

End Function

Hope this helps..

- Ralf

Astrato.io Head of R&D

View solution in original post

26 Replies
Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Ralf,

suggest that you develop those functions and share them with the rest of us Wink

Kidding aside - those would make excellent improvement requests. I just don't know how high would it be on the priority list, since the need is quite exotic...

take care,

Oleg

rbecher
MVP
MVP
Author

Oleg,

thx for your suggestion but, VBScript isn't the right place for it. We're playing around with some C++ implementations but this still needs a VBScript call and a separate dll...

Would love a QV script improvement!

Ralf

Astrato.io Head of R&D
Not applicable

I'd look at calling a VBscript function from QlikView. And once you're in VBscript-land, you can call an external library.

rbecher
MVP
MVP
Author

..maybe there is something new in QV 9 ?

Astrato.io Head of R&D
Not applicable

Doesn't appear to be.

Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Ralf,

why don't you request it as an "idea" and then convince other people to "second" your movement?

Oleg

rbecher
MVP
MVP
Author

Oleg,

I don't really know how or where to do this here. I'm a bit new.. 8-)

Rafl

Astrato.io Head of R&D
amien
Specialist
Specialist

i hope this in v10 🙂

i want this too..

amien
Specialist
Specialist

its not possible to use this VBA script .. and then use it in an expression?

http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance

using something similiair with regex