Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
DaveH
Contributor
Contributor

Matching String values

Hi All,

I've been racking my brain a lot over this and have attempted many mapping tables, lookups, substrings and wildmatches but still to now success. I'm hoping that I'm missing something stupidly straightforward other than it just can't be done.....

So the details of the issue is as follows;

I have 2 fields, LOCATION and DESCRIPTION. Now as always when you allow manual entry the data entered into the "DESCRIPTION" field is unclean and all over the place. What I'm trying to do is find any occurrence of the Value in LOCATION in the DESCRIPTION field and then flag it.

For example;

LOCATION DESCRIPTION Potential Match
QUAY WAY 8-10 QUAY, WAY Yes
HARVESTER ROAD A7850058 HARVESTER 2595254 Yes
SHEFFILED VICTORIA STREET VICTORIA St Yes
BAKER STREET MUFFIN LANE No

 

Is there a way to accomplish this? The values could appear anywhere in the string which is what I'm struggling with.

 

Many Thanks!

Labels (2)
1 Solution

Accepted Solutions
Clement15
Partner - Specialist
Partner - Specialist

Hello,

This seems to work, you should test on more cases and pay attention to the conditions you want

 

Data:
Load
*,
SubField(LOCATION,' ') as Temp

Inline [
LOCATION, DESCRIPTION
QUAY WAY, 8-10 QUAY WAY
HARVESTER ROAD, A7850058 HARVESTER 2595254
SHEFFILED VICTORIA STREET, VICTORIA St
BAKER STREET, MUFFIN LANE
]
;


Final:
Load
LOCATION,
DESCRIPTION,
if(max(SubStringCount(DESCRIPTION,Temp)),'Yes','No') as [Potential Match]
Resident Data
group by LOCATION,
DESCRIPTION,;


Drop table Data;

 

View solution in original post

2 Replies
Clement15
Partner - Specialist
Partner - Specialist

Hello,

This seems to work, you should test on more cases and pay attention to the conditions you want

 

Data:
Load
*,
SubField(LOCATION,' ') as Temp

Inline [
LOCATION, DESCRIPTION
QUAY WAY, 8-10 QUAY WAY
HARVESTER ROAD, A7850058 HARVESTER 2595254
SHEFFILED VICTORIA STREET, VICTORIA St
BAKER STREET, MUFFIN LANE
]
;


Final:
Load
LOCATION,
DESCRIPTION,
if(max(SubStringCount(DESCRIPTION,Temp)),'Yes','No') as [Potential Match]
Resident Data
group by LOCATION,
DESCRIPTION,;


Drop table Data;

 

marcus_sommer

shit in = shit out ... but by applying a chain of n (dozens) transformation steps you may get a sufficient matching-rate. Personally I would  try to go in such a direction:

  • delimiter-definition to split the values in single parts
  • unifying the data per upper/lower
  • identifying and removing invalid data
  • replacing different writings and shortcuts like st and street to a common term
  • splitting the values with: subfield(Field, Delimiter, iterno())
  • important is to include recno() and iterno() to this load to track the source- and target-records and to merge them later again
  • wildmatch() against the origin values and the sub-parts
  • applying a LevenshteinDist - script and chart function | Qlik Cloud Help
  • merging the wildmatch/levenshtein results in any sensible scoring-logic

just to name a few ones.

If the data aren't too poor it's often not too expensive to get matching-rates of 50% - 80% but each single percent more will usually cost exponentially efforts ...