Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Removing duplicate people with many fields

Hi!

I have a database with the following information:

Person.LastName

Person.FirstName

Person.BirthDate

Person.IdNumber (which consists of birthdate DDMMYY-NNNN and four characters in the end)

The case is that all people have all five, all people have something in Person.IdNumber but not in standard form and some people might have their first names or last names in the opposite fields. Birthdate is most likely a field which everybody has but there might be some duplicates because people might have been born on the same day. I'm thinking of doing this in the following way:

Rows are duplicates:

IF IdNumbers are same

or all of the following apply:

Person.IdNumber is not in a standard form MMDDYY-NNNN because NNNN might be missing

IF BirthDates are the same

IF LastNames are the same

IF FirstNames are the same

I'm having trouble starting to script this. Could someone help me out?

All the best,

Mikael

2 Replies
Not applicable
Author

Can you not use a  Load distinct   in the script for deduplication ?

If you data is not standardized, you'll have to do a cleanup as well. For instance formatting birthdate to a standard date.

hic
Former Employee
Former Employee

Load

          LastName,

          FirstName,

          BirthDate,

          IdNumber,

          BirthDate &'|'& FirstName &'|'& LastName as ID2

          From SourceFile

          Where not Exists(IdNumber) and Not Exists(ID2, BirthDate &'|'& FirstName &'|'& LastName);