Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
We have a couple hundred thousand customer records on our system as the moment and I was hoping to investigate what can be merged then deleted to keep our record count down.
I will be trying to check this over several different feeds as unfortunately most of these fields can be manually written so will have different formatting mistakes & spelling etc.
My data is like below:
OrgID | Company Name | Postcode |
---|---|---|
1 | Blahblah | AB14 00T |
2 | Bladeblah | AB14 00T |
3 | RunningoutofNames | AB12 111 |
4 | MadeupName | AZ4 78T |
5 | Notsharing | TR4 9E9 |
6 | PenguinDance | XER 101 |
What I'd like to be able to do is Identify Companies with the same postcode that share a certain number of characters.
So Orgid 1 & 2 would match as they share 8 of the same characters in their company name.
Ideally the table would only show OrgID 1 & 2 now as the rest don't fit the criteria.
Anyone got any ideas on how I would enable this to display?
Please note - there may be more than one instance where a company name is duplicated under the same postcode.
Thank you,
maybe like this:
(you donot Need to drop the origin table, but you need to use other fieldnames)
TEST:
load * inline [
OrgID,Company Name,Postcode
1, Blahblah, AB14 00T
2, Bladeblah, AB14 00T
3, RunningoutofNames, AB12 111
4, MadeupName, AZ4 78T
5, Notsharing, TR4 9E9
6, PenguinDance, XER 101
];
left join(TEST)
load count(OrgID) as Numb,
Postcode
Resident TEST
group by Postcode;
Final:
NoConcatenate load *
Resident TEST
where Numb > 1;
drop table TEST;
maybe like this:
(you donot Need to drop the origin table, but you need to use other fieldnames)
TEST:
load * inline [
OrgID,Company Name,Postcode
1, Blahblah, AB14 00T
2, Bladeblah, AB14 00T
3, RunningoutofNames, AB12 111
4, MadeupName, AZ4 78T
5, Notsharing, TR4 9E9
6, PenguinDance, XER 101
];
left join(TEST)
load count(OrgID) as Numb,
Postcode
Resident TEST
group by Postcode;
Final:
NoConcatenate load *
Resident TEST
where Numb > 1;
drop table TEST;
Hi,
You can use fuzzy logic to compare strings, I found this post. Maybe will be useful