Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Good day
I think I still haven't mastered this yet since I get results that don't seem to be correct, I want to check the number of duplicated records in my master file. I previously posted this question in a previous discussion however it seems like I am at step 1 again.
here is the link to the previous post.
here is my code, so if I get the same BPID, FirstName, Surname, Email and Model then the records will be classified as being duplicated. But I get a third graph of Null Values why is this the case and I don't think the number of duplicated records is correct as well. Thanks in advance
Range:
LOAD [BP ID],
Org.Name,
[First Name],
[Last Name],
Email,
Model,
Telephone,
if(len([Last Name]>0) and len([First Name])>0 and len([Email])>0 and len([Telephone])>=9,'Complete Info','Incomplete Info') AS [All Info Provided]
FROM
[Copy of Owners.xlsx]
(ooxml, embedded labels, table is Sheet1);
Dupli_Check:
LEFT JOIN (Range)
LOAD [BP ID],
[First Name],
[Last Name],
Email,
Model,
If( Count(([BP ID])) >1 and Count([First Name]) >1 and Count([Last Name]) >1 and Count([Email]) >1 and Count(Model) >1, 'Duplicated', 'Non Duplicated') as [Duplicates]
RESIDENT Range
GROUP BY [BP ID],[First Name],[Last Name],Email,Model;
Try something like
Range:
LOAD [BP ID],
Org.Name,
[First Name],
[Last Name],
Email,
Model,
Telephone,
if(len([Last Name]>0) and len([First Name])>0 and len([Email])>0 and len([Telephone])>=9,'Complete Info','Incomplete Info') AS [All Info Provided],
AutonumberHash256([BP ID],[First Name],[Last Name],Email,Model) as DuplicateKey
FROM
[Copy of Owners.xlsx]
(ooxml, embedded labels, table is Sheet1);
Dupli_Check:
LEFT JOIN (Range)
LOAD DuplicateKey,
If( Count(DuplicateKey) >1 , 'Duplicated', 'Non Duplicated') as [Duplicates]
RESIDENT Range
GROUP BY DuplicateKey;
Do some of your key fields contain NULL?
maybe or
If( Count(([BP ID])) >1 or Count([First Name]) >1 or Count([Last Name]) >1 or Count([Email]) >1 or Count(Model) >1, 'Duplicated', 'Non Duplicated') as [Duplicates] |
or
LEFT JOIN (Range)
LOAD [BP ID],
Count(([BP ID])) ad no_bpid
RESIDENT Range
GROUP BY [BP ID];
LEFT JOIN (Range)
LOAD [First Name],
[Last Name],
Count([First Name]&[Last Name]) ad no_first_name_last_name
RESIDENT Range
GROUP BY [First Name], [Last Name];
etc
Hi Swuel
The BPID field doesn't contain Null values at all, while the other fields do contain null values however I get a third chart on the graph which resembles null values how correct is this?.
Please see picture attached.
Try something like
Range:
LOAD [BP ID],
Org.Name,
[First Name],
[Last Name],
Email,
Model,
Telephone,
if(len([Last Name]>0) and len([First Name])>0 and len([Email])>0 and len([Telephone])>=9,'Complete Info','Incomplete Info') AS [All Info Provided],
AutonumberHash256([BP ID],[First Name],[Last Name],Email,Model) as DuplicateKey
FROM
[Copy of Owners.xlsx]
(ooxml, embedded labels, table is Sheet1);
Dupli_Check:
LEFT JOIN (Range)
LOAD DuplicateKey,
If( Count(DuplicateKey) >1 , 'Duplicated', 'Non Duplicated') as [Duplicates]
RESIDENT Range
GROUP BY DuplicateKey;
Good day
I think it worked, cause this is the results I get.
Please see image below.
But what did this do?
AutonumberHash256
Unfortunately your code gave me the same results florentina.dogaru
but thanks for your efforts.
It will create a unique value based on the arguments to the function.
Thanks alot for your help swuel, I really appreciate it!!!