Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
I've been racking my brain a lot over this and have attempted many mapping tables, lookups, substrings and wildmatches but still to now success. I'm hoping that I'm missing something stupidly straightforward other than it just can't be done.....
So the details of the issue is as follows;
I have 2 fields, LOCATION and DESCRIPTION. Now as always when you allow manual entry the data entered into the "DESCRIPTION" field is unclean and all over the place. What I'm trying to do is find any occurrence of the Value in LOCATION in the DESCRIPTION field and then flag it.
For example;
LOCATION | DESCRIPTION | Potential Match |
QUAY WAY | 8-10 QUAY, WAY | Yes |
HARVESTER ROAD | A7850058 HARVESTER 2595254 | Yes |
SHEFFILED VICTORIA STREET | VICTORIA St | Yes |
BAKER STREET | MUFFIN LANE | No |
Is there a way to accomplish this? The values could appear anywhere in the string which is what I'm struggling with.
Many Thanks!
Hello,
This seems to work, you should test on more cases and pay attention to the conditions you want
Data:
Load
*,
SubField(LOCATION,' ') as Temp
Inline [
LOCATION, DESCRIPTION
QUAY WAY, 8-10 QUAY WAY
HARVESTER ROAD, A7850058 HARVESTER 2595254
SHEFFILED VICTORIA STREET, VICTORIA St
BAKER STREET, MUFFIN LANE
]
;
Final:
Load
LOCATION,
DESCRIPTION,
if(max(SubStringCount(DESCRIPTION,Temp)),'Yes','No') as [Potential Match]
Resident Data
group by LOCATION,
DESCRIPTION,;
Drop table Data;
Hello,
This seems to work, you should test on more cases and pay attention to the conditions you want
Data:
Load
*,
SubField(LOCATION,' ') as Temp
Inline [
LOCATION, DESCRIPTION
QUAY WAY, 8-10 QUAY WAY
HARVESTER ROAD, A7850058 HARVESTER 2595254
SHEFFILED VICTORIA STREET, VICTORIA St
BAKER STREET, MUFFIN LANE
]
;
Final:
Load
LOCATION,
DESCRIPTION,
if(max(SubStringCount(DESCRIPTION,Temp)),'Yes','No') as [Potential Match]
Resident Data
group by LOCATION,
DESCRIPTION,;
Drop table Data;
shit in = shit out ... but by applying a chain of n (dozens) transformation steps you may get a sufficient matching-rate. Personally I would try to go in such a direction:
just to name a few ones.
If the data aren't too poor it's often not too expensive to get matching-rates of 50% - 80% but each single percent more will usually cost exponentially efforts ...