Qlik Community

QlikView Creating Analytics

Discussion Board for collaboration related to Creating Analytics for QlikView.

Not applicable

Removing duplicate people with many fields

Hi!

I have a database with the following information:

Person.LastName

Person.FirstName

Person.BirthDate

Person.IdNumber (which consists of birthdate DDMMYY-NNNN and four characters in the end)

The case is that all people have all five, all people have something in Person.IdNumber but not in standard form and some people might have their first names or last names in the opposite fields. Birthdate is most likely a field which everybody has but there might be some duplicates because people might have been born on the same day. I'm thinking of doing this in the following way:

Rows are duplicates:

IF IdNumbers are same

or all of the following apply:

Person.IdNumber is not in a standard form MMDDYY-NNNN because NNNN might be missing

IF BirthDates are the same

IF LastNames are the same

IF FirstNames are the same

I'm having trouble starting to script this. Could someone help me out?

All the best,

Mikael

Tags (1)
2 Replies
Not applicable

Re: Removing duplicate people with many fields

Can you not use a  Load distinct   in the script for deduplication ?

If you data is not standardized, you'll have to do a cleanup as well. For instance formatting birthdate to a standard date.

Employee
Employee

Re: Removing duplicate people with many fields

Load

          LastName,

          FirstName,

          BirthDate,

          IdNumber,

          BirthDate &'|'& FirstName &'|'& LastName as ID2

          From SourceFile

          Where not Exists(IdNumber) and Not Exists(ID2, BirthDate &'|'& FirstName &'|'& LastName);

Community Browser