Qlik Community

QlikView App Development

Discussion Board for collaboration related to QlikView App Development.

Highlighted
Not applicable

Finding duplications or similar records

Hi all!

I am looking for a solution in order to be able to find similar or duplicate records in my tables. Does anyone know an expression that for example looks at a record, notes the given number of characters and looks for it in other records?

Any help would be appreciated!

Thx,


Br

T

1 Solution

Accepted Solutions

Re: Finding duplications or similar records

Firstly you need to have list of names that you wish to compare too.

Then try this link

qlikviewcookbook.com/list-recipes/

and look for Mapping With Wildcards

9 Replies

Re: Finding duplications or similar records

Could you provide an example?

Not applicable

Re: Finding duplications or similar records

Hi,

an example:

this name has been entered twice: 

Matik Zamad Albas
Zamad Albas Matim

Not only the order, but also the 'Malim' part is different, but I want an expression that recognizes them as similar names/records

amit_saini
Honored Contributor III

Re: Finding duplications or similar records

Hi,

If requirement is just for 5-6 names than you can use this :

LOAD Name,

pick( WildMatch(Name,'*Zamad Albas Matim*','*Name2*','*Name3*'),'Matik Zamad Albas','P2','P3')

as Data

FROM

  1. Prcatice.xlsx

(ooxml, embedded labels, table is Sheet3);

Else for bigger data , I don't think so it is possible.

Let's wait for others opinion.

Thanks,

AS

Not applicable

Re: Finding duplications or similar records

Hi amit,

the problem is, that I need this on more than 5 thousand names, so it is not an easy task to solve...

amit_saini
Honored Contributor III

Re: Finding duplications or similar records

Tibor,

If this is the case than sorry bro , not possible.

Thanks,
AS

Not applicable

Re: Finding duplications or similar records

Guys, does anyone know a solution that is just nearing to what I would like to achieve? There must be someone who has encountered a similar problem....

morganaaron
Valued Contributor

Re: Finding duplications or similar records

Are there any rules to the names reordering or changes?

For example, you've said number of characters - you can use len() to get the string length of your field value, so if the name was always the same length or had a threshold for change (1 to 2 characters for example) you can define that as your first 'rule'.

If one of the names (say the middle name) will always be consistently in the string somewhere, you can do a search for it using the index() function matching with the subfield() function or something similar.

If there aren't really any definitive rules, I think you're going to be out of luck. Your best bet is to look at fuzzy matching algorithms and see what an acceptable tolerance level for mistakes is.

Re: Finding duplications or similar records

Firstly you need to have list of names that you wish to compare too.

Then try this link

qlikviewcookbook.com/list-recipes/

and look for Mapping With Wildcards

Not applicable

Re: Finding duplications or similar records

Thanks guys, I think you are right, no perfect solution exists to this problem yet. I managed to fabricate an expression that is somewhat generating the results I wanted (with a combination of len() and purgechar()) but it is far from perfect. Thanks anyways the comments, I really apprieciate your advices!

Br

Tibor