Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Finding duplications or similar records

Hi all!

I am looking for a solution in order to be able to find similar or duplicate records in my tables. Does anyone know an expression that for example looks at a record, notes the given number of characters and looks for it in other records?

Any help would be appreciated!

Thx,


Br

T

1 Solution

Accepted Solutions
robert_mika
Master III
Master III

Firstly you need to have list of names that you wish to compare too.

Then try this link

qlikviewcookbook.com/list-recipes/

and look for Mapping With Wildcards

View solution in original post

9 Replies
robert_mika
Master III
Master III

Could you provide an example?

Not applicable
Author

Hi,

an example:

this name has been entered twice: 

Matik Zamad Albas
Zamad Albas Matim

Not only the order, but also the 'Malim' part is different, but I want an expression that recognizes them as similar names/records

amit_saini
Master III
Master III

Hi,

If requirement is just for 5-6 names than you can use this :

LOAD Name,

pick( WildMatch(Name,'*Zamad Albas Matim*','*Name2*','*Name3*'),'Matik Zamad Albas','P2','P3')

as Data

FROM

  1. Prcatice.xlsx

(ooxml, embedded labels, table is Sheet3);

Else for bigger data , I don't think so it is possible.

Let's wait for others opinion.

Thanks,

AS

Not applicable
Author

Hi amit,

the problem is, that I need this on more than 5 thousand names, so it is not an easy task to solve...

amit_saini
Master III
Master III

Tibor,

If this is the case than sorry bro , not possible.

Thanks,
AS

Not applicable
Author

Guys, does anyone know a solution that is just nearing to what I would like to achieve? There must be someone who has encountered a similar problem....

morganaaron
Specialist
Specialist

Are there any rules to the names reordering or changes?

For example, you've said number of characters - you can use len() to get the string length of your field value, so if the name was always the same length or had a threshold for change (1 to 2 characters for example) you can define that as your first 'rule'.

If one of the names (say the middle name) will always be consistently in the string somewhere, you can do a search for it using the index() function matching with the subfield() function or something similar.

If there aren't really any definitive rules, I think you're going to be out of luck. Your best bet is to look at fuzzy matching algorithms and see what an acceptable tolerance level for mistakes is.

robert_mika
Master III
Master III

Firstly you need to have list of names that you wish to compare too.

Then try this link

qlikviewcookbook.com/list-recipes/

and look for Mapping With Wildcards

Not applicable
Author

Thanks guys, I think you are right, no perfect solution exists to this problem yet. I managed to fabricate an expression that is somewhat generating the results I wanted (with a combination of len() and purgechar()) but it is far from perfect. Thanks anyways the comments, I really apprieciate your advices!

Br

Tibor