Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi!
I have a database with the following information:
Person.LastName
Person.FirstName
Person.BirthDate
Person.IdNumber (which consists of birthdate DDMMYY-NNNN and four characters in the end)
The case is that all people have all five, all people have something in Person.IdNumber but not in standard form and some people might have their first names or last names in the opposite fields. Birthdate is most likely a field which everybody has but there might be some duplicates because people might have been born on the same day. I'm thinking of doing this in the following way:
Rows are duplicates:
IF IdNumbers are same
or all of the following apply:
Person.IdNumber is not in a standard form MMDDYY-NNNN because NNNN might be missing
IF BirthDates are the same
IF LastNames are the same
IF FirstNames are the same
I'm having trouble starting to script this. Could someone help me out?
All the best,
Mikael
Can you not use a Load distinct in the script for deduplication ?
If you data is not standardized, you'll have to do a cleanup as well. For instance formatting birthdate to a standard date.
Load
LastName,
FirstName,
BirthDate,
IdNumber,
BirthDate &'|'& FirstName &'|'& LastName as ID2
From SourceFile
Where not Exists(IdNumber) and Not Exists(ID2, BirthDate &'|'& FirstName &'|'& LastName);