Solved: Matching String values - Qlik Community

DaveH · ‎2025-01-29

Hi All,

I've been racking my brain a lot over this and have attempted many mapping tables, lookups, substrings and wildmatches but still to now success. I'm hoping that I'm missing something stupidly straightforward other than it just can't be done.....

So the details of the issue is as follows;

I have 2 fields, LOCATION and DESCRIPTION. Now as always when you allow manual entry the data entered into the "DESCRIPTION" field is unclean and all over the place. What I'm trying to do is find any occurrence of the Value in LOCATION in the DESCRIPTION field and then flag it.

For example;

LOCATION	DESCRIPTION	Potential Match
QUAY WAY	8-10 QUAY, WAY	Yes
HARVESTER ROAD	A7850058 HARVESTER 2595254	Yes
SHEFFILED VICTORIA STREET	VICTORIA St	Yes
BAKER STREET	MUFFIN LANE	No

Is there a way to accomplish this? The values could appear anywhere in the string which is what I'm struggling with.

Many Thanks!

Clement15 · ‎2025-01-29

Hello,

This seems to work, you should test on more cases and pay attention to the conditions you want

Data:
Load 
*,
SubField(LOCATION,' ') as Temp

Inline [
LOCATION, DESCRIPTION 
QUAY WAY, 8-10 QUAY WAY 
HARVESTER ROAD, A7850058 HARVESTER 2595254 
SHEFFILED VICTORIA STREET, VICTORIA St
BAKER STREET, MUFFIN LANE
]
;


Final:
Load
LOCATION,
DESCRIPTION,
if(max(SubStringCount(DESCRIPTION,Temp)),'Yes','No') as [Potential Match]
Resident Data
group by LOCATION,
DESCRIPTION,;


Drop table Data;

View solution in original post

Clement15 · ‎2025-01-29

Hello,

This seems to work, you should test on more cases and pay attention to the conditions you want

Data:
Load 
*,
SubField(LOCATION,' ') as Temp

Inline [
LOCATION, DESCRIPTION 
QUAY WAY, 8-10 QUAY WAY 
HARVESTER ROAD, A7850058 HARVESTER 2595254 
SHEFFILED VICTORIA STREET, VICTORIA St
BAKER STREET, MUFFIN LANE
]
;


Final:
Load
LOCATION,
DESCRIPTION,
if(max(SubStringCount(DESCRIPTION,Temp)),'Yes','No') as [Potential Match]
Resident Data
group by LOCATION,
DESCRIPTION,;


Drop table Data;

marcus_sommer · ‎2025-01-29

shit in = shit out ... but by applying a chain of n (dozens) transformation steps you may get a sufficient matching-rate. Personally I would try to go in such a direction:

delimiter-definition to split the values in single parts
unifying the data per upper/lower
identifying and removing invalid data
replacing different writings and shortcuts like st and street to a common term
splitting the values with: subfield(Field, Delimiter, iterno())
important is to include recno() and iterno() to this load to track the source- and target-records and to merge them later again
wildmatch() against the origin values and the sub-parts
applying a LevenshteinDist - script and chart function | Qlik Cloud Help
merging the wildmatch/levenshtein results in any sensible scoring-logic

just to name a few ones.

If the data aren't too poor it's often not too expensive to get matching-rates of 50% - 80% but each single percent more will usually cost exponentially efforts ...

Matching String values

Data Model

Script and Expressions

Matching String values

Data Model

Script and Expressions

Related Topics