Discussion board where members can get started with QlikView.
We have a couple hundred thousand customer records on our system as the moment and I was hoping to investigate what can be merged then deleted to keep our record count down.
I will be trying to check this over several different feeds as unfortunately most of these fields can be manually written so will have different formatting mistakes & spelling etc.
My data is like below:
What I'd like to be able to do is Identify Companies with the same postcode that share a certain number of characters.
So Orgid 1 & 2 would match as they share 8 of the same characters in their company name.
Ideally the table would only show OrgID 1 & 2 now as the rest don't fit the criteria.
Anyone got any ideas on how I would enable this to display?
Please note - there may be more than one instance where a company name is duplicated under the same postcode.
maybe like this:
(you donot Need to drop the origin table, but you need to use other fieldnames)
TEST: load * inline [ OrgID,Company Name,Postcode 1, Blahblah, AB14 00T 2, Bladeblah, AB14 00T 3, RunningoutofNames, AB12 111 4, MadeupName, AZ4 78T 5, Notsharing, TR4 9E9 6, PenguinDance, XER 101 ]; left join(TEST) load count(OrgID) as Numb, Postcode Resident TEST group by Postcode; Final: NoConcatenate load * Resident TEST where Numb > 1; drop table TEST;
View solution in original post
You can use fuzzy logic to compare strings, I found this post. Maybe will be useful
Approximate String Matching in QlikView
String Percentage Comparison?