Skip to main content
Announcements
See why Qlik was named a Leader in the 2024 Gartner® Magic Quadrant™ for Data Integration Tools for the ninth year in a row: Get the report
cancel
Showing results for 
Search instead for 
Did you mean: 
karan_kn
Creator II
Creator II

Join table without prefix

To join both tables based on Company name, but the company name have inconsistent prefixes.

COLOR MAP CORPCOLOR MAP CORPORATION

 

Capture.JPG

 

Table1:
LOAD * INLINE [
ID, Company
1001, COLOR MAP CORP
1002, DENNISE PUBLISHING LTD
1003, "DSN GROUP, INC."
1004, DREAM TYPE LLC
1005, TECHNICAL SOLUTIONS (UK)
1006, INFO PRIVATE LIMITED
];

Table2:
LOAD * INLINE [
Company, City
COLOR MAP CORPORATION, London
DENNISE PUBLISHING , Delhi
"DSN GROUP, INC", Germany
DREAM TYPE , Tokyo
TECHNICAL SOLUTIONS , London
INFO PVT LTD., Paris
];

 

1 Solution

Accepted Solutions
Saravanan_Desingh

One solution is Levenshtein Distance.

VBA Macro:
Function levenshtein( a, b )
	Dim i,j,cost,d,min1,min2,min3
	
 ' Avoid calculations where there there are empty words
	If Len( a ) = 0 Then levenshtein = Len( b 😞 Exit Function
	If Len( b ) = 0 Then levenshtein = Len( a 😞 Exit Function
	
 ' Array initialization	
	ReDim d( Len( a ), Len( b ) )

	For i = 0 To Len( a 😞 d( i, 0 ) = i: Next
	For j = 0 To Len( b 😞 d( 0, j ) = j: Next
	
 ' Actual calculation
	For i = 1 To Len( a )
		For j = 1 To Len( b )
                        If Mid(a, i, 1) = Mid(b, j, 1) Then cost = 0 Else cost = 1 End If
			
			' Since min() function is not a part of VBScript, we'll "emulate" it below
			min1 = ( d( i - 1, j ) + 1 )
			min2 = ( d( i, j - 1 ) + 1 )
			min3 = ( d( i - 1, j - 1 ) + cost )
			
			If min1 <= min2 And min1 <= min3 Then
				d( i, j ) = min1
			ElseIf min2 <= min1 And min2 <= min3 Then
				d( i, j ) = min2
			Else
				d( i, j ) = min3
			End If
		Next
	Next

	levenshtein = d( Len( a ), Len( b ) )
End Function
Script:
Table1:
LOAD * INLINE [
ID, Company
1001, COLOR MAP CORP
1002, DENNISE PUBLISHING LTD
1003, "DSN GROUP, INC."
1004, DREAM TYPE LLC
1005, TECHNICAL SOLUTIONS (UK)
1006, INFO PRIVATE LIMITED
];

Join

Table2:
LOAD * INLINE [
Company2, City
COLOR MAP CORPORATION, London
DENNISE PUBLISHING , Delhi
"DSN GROUP, INC", Germany
DREAM TYPE , Tokyo
TECHNICAL SOLUTIONS , London
INFO PVT LTD., Paris
];

tabOut:
LOAD *, Levenshtein(Company, Company2) As LevenDist
Resident Table1;

Drop Table Table1;

commQV36.PNG

View solution in original post

1 Reply
Saravanan_Desingh

One solution is Levenshtein Distance.

VBA Macro:
Function levenshtein( a, b )
	Dim i,j,cost,d,min1,min2,min3
	
 ' Avoid calculations where there there are empty words
	If Len( a ) = 0 Then levenshtein = Len( b 😞 Exit Function
	If Len( b ) = 0 Then levenshtein = Len( a 😞 Exit Function
	
 ' Array initialization	
	ReDim d( Len( a ), Len( b ) )

	For i = 0 To Len( a 😞 d( i, 0 ) = i: Next
	For j = 0 To Len( b 😞 d( 0, j ) = j: Next
	
 ' Actual calculation
	For i = 1 To Len( a )
		For j = 1 To Len( b )
                        If Mid(a, i, 1) = Mid(b, j, 1) Then cost = 0 Else cost = 1 End If
			
			' Since min() function is not a part of VBScript, we'll "emulate" it below
			min1 = ( d( i - 1, j ) + 1 )
			min2 = ( d( i, j - 1 ) + 1 )
			min3 = ( d( i - 1, j - 1 ) + cost )
			
			If min1 <= min2 And min1 <= min3 Then
				d( i, j ) = min1
			ElseIf min2 <= min1 And min2 <= min3 Then
				d( i, j ) = min2
			Else
				d( i, j ) = min3
			End If
		Next
	Next

	levenshtein = d( Len( a ), Len( b ) )
End Function
Script:
Table1:
LOAD * INLINE [
ID, Company
1001, COLOR MAP CORP
1002, DENNISE PUBLISHING LTD
1003, "DSN GROUP, INC."
1004, DREAM TYPE LLC
1005, TECHNICAL SOLUTIONS (UK)
1006, INFO PRIVATE LIMITED
];

Join

Table2:
LOAD * INLINE [
Company2, City
COLOR MAP CORPORATION, London
DENNISE PUBLISHING , Delhi
"DSN GROUP, INC", Germany
DREAM TYPE , Tokyo
TECHNICAL SOLUTIONS , London
INFO PVT LTD., Paris
];

tabOut:
LOAD *, Levenshtein(Company, Company2) As LevenDist
Resident Table1;

Drop Table Table1;

commQV36.PNG