Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
kaygee28
Contributor III
Contributor III

Checking for Duplicated Records

Good day

I think I still haven't mastered this yet since I get results that don't seem to be correct, I want to check the number of duplicated records in my master file. I previously posted this question in a previous discussion however it seems like I am at step 1 again.

here is the link to the previous post.

Checking for Duplicates

here is my code, so if I get the same BPID, FirstName, Surname, Email and Model then the records will be classified as being duplicated. But I get a third graph of Null Values why is this the case and I don't think the number of duplicated records is correct as well. Thanks in advance


Range:

LOAD [BP ID],

     Org.Name,

     [First Name],

     [Last Name],

     Email,

     Model,

     Telephone,

    

if(len([Last Name]>0) and len([First Name])>0 and len([Email])>0 and len([Telephone])>=9,'Complete Info','Incomplete Info') AS [All Info Provided]

       

    

FROM

[Copy of Owners.xlsx]

(ooxml, embedded labels, table is Sheet1);

Dupli_Check:

LEFT JOIN (Range)

LOAD [BP ID], 

     [First Name],

     [Last Name],

      Email,

      Model,

     If( Count(([BP ID])) >1 and Count([First Name]) >1 and Count([Last Name]) >1 and Count([Email]) >1 and Count(Model) >1, 'Duplicated', 'Non Duplicated') as [Duplicates]

RESIDENT Range

GROUP BY [BP ID],[First Name],[Last Name],Email,Model;

1 Solution

Accepted Solutions
swuehl
MVP
MVP

Try something like

Range:

LOAD [BP ID],

     Org.Name,

     [First Name],

     [Last Name],

     Email,

     Model,

     Telephone,

   

if(len([Last Name]>0) and len([First Name])>0 and len([Email])>0 and len([Telephone])>=9,'Complete Info','Incomplete Info') AS [All Info Provided],

     AutonumberHash256([BP ID],[First Name],[Last Name],Email,Model) as DuplicateKey

      

   

FROM

[Copy of Owners.xlsx]

(ooxml, embedded labels, table is Sheet1);

Dupli_Check:

LEFT JOIN (Range)

LOAD DuplicateKey,

         

     If( Count(DuplicateKey) >1 , 'Duplicated', 'Non Duplicated') as [Duplicates]

RESIDENT Range

GROUP BY DuplicateKey;

View solution in original post

8 Replies
swuehl
MVP
MVP

Do some of your key fields contain NULL?

florentina_doga
Partner - Creator III
Partner - Creator III

maybe or

If( Count(([BP ID])) >1 or Count([First Name]) >1 or Count([Last Name]) >1 or Count([Email]) >1 or Count(Model) >1, 'Duplicated', 'Non Duplicated') as [Duplicates]

or

LEFT JOIN (Range)

LOAD [BP ID], 

  Count(([BP ID])) ad no_bpid

RESIDENT Range

GROUP BY [BP ID];

LEFT JOIN (Range)

LOAD [First Name],

[Last Name],

  Count([First Name]&[Last Name]) ad no_first_name_last_name

RESIDENT Range

GROUP BY [First Name], [Last Name];

etc

kaygee28
Contributor III
Contributor III
Author

Hi Swuel

The BPID field doesn't contain Null values at all, while the other fields do contain null values however I get a third chart on the graph which resembles null values how correct is this?.

Please see picture attached.

Duplicates.png

swuehl
MVP
MVP

Try something like

Range:

LOAD [BP ID],

     Org.Name,

     [First Name],

     [Last Name],

     Email,

     Model,

     Telephone,

   

if(len([Last Name]>0) and len([First Name])>0 and len([Email])>0 and len([Telephone])>=9,'Complete Info','Incomplete Info') AS [All Info Provided],

     AutonumberHash256([BP ID],[First Name],[Last Name],Email,Model) as DuplicateKey

      

   

FROM

[Copy of Owners.xlsx]

(ooxml, embedded labels, table is Sheet1);

Dupli_Check:

LEFT JOIN (Range)

LOAD DuplicateKey,

         

     If( Count(DuplicateKey) >1 , 'Duplicated', 'Non Duplicated') as [Duplicates]

RESIDENT Range

GROUP BY DuplicateKey;

kaygee28
Contributor III
Contributor III
Author

Good day

I think it worked, cause this is the results I get.

Please see image below.

Duplicated2016.png

But what did this do?

  AutonumberHash256

kaygee28
Contributor III
Contributor III
Author

Unfortunately your code gave me the same results florentina.dogaru

but thanks for your efforts.

swuehl
MVP
MVP

It will create a unique value based on the arguments to the function.

kaygee28
Contributor III
Contributor III
Author

Thanks alot for your help swuel, I really appreciate it!!!