Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
craig157
Creator II
Creator II

Duplicate Records Inspection - Help to find them?

Hi All,

We have a couple hundred thousand customer records on our system as the moment and I was hoping to investigate what can be merged then deleted to keep our record count down.

I will be trying to check this over several different feeds as unfortunately most of these fields can be manually written so will have different formatting mistakes & spelling etc.

My data is like below:

OrgIDCompany NamePostcode
1BlahblahAB14 00T
2BladeblahAB14 00T
3RunningoutofNamesAB12 111
4MadeupNameAZ4 78T
5NotsharingTR4 9E9
6PenguinDanceXER 101

What I'd like to be able to do is Identify Companies with the same postcode that share a certain number of characters.

So Orgid 1 & 2 would match as they share 8 of the same characters in their company name.

Ideally the table would only show OrgID 1 & 2 now as the rest don't fit the criteria.

Anyone got any ideas on how I would enable this to display?

Please note - there may be more than one instance where a company name is duplicated under the same postcode.

Thank you,

1 Solution

Accepted Solutions
Anonymous
Not applicable

maybe like this:

(you donot Need to drop the origin table, but you need to use other fieldnames)

TEST:
loadinline [
OrgID,Company Name,Postcode
1, Blahblah, AB14 00T
2, Bladeblah, AB14 00T
3, RunningoutofNames, AB12 111
4, MadeupName, AZ4 78T
5, Notsharing, TR4 9E9
6, PenguinDance, XER 101
]
;

left join(TEST)
load count(OrgID) as Numb,
Postcode
Resident TEST
group by Postcode;

Final:
NoConcatenate load *
Resident TEST
where Numb > 1;

drop table TEST;

View solution in original post

3 Replies
Anonymous
Not applicable

maybe like this:

(you donot Need to drop the origin table, but you need to use other fieldnames)

TEST:
loadinline [
OrgID,Company Name,Postcode
1, Blahblah, AB14 00T
2, Bladeblah, AB14 00T
3, RunningoutofNames, AB12 111
4, MadeupName, AZ4 78T
5, Notsharing, TR4 9E9
6, PenguinDance, XER 101
]
;

left join(TEST)
load count(OrgID) as Numb,
Postcode
Resident TEST
group by Postcode;

Final:
NoConcatenate load *
Resident TEST
where Numb > 1;

drop table TEST;

jmvilaplanap
Specialist
Specialist

Hi,

You can use fuzzy logic to compare strings, I found this post. Maybe will be useful

Approximate String Matching in QlikView

MarcoWedel

maybe helpful:

String Percentage Comparison?

regards

Marco