Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Good Day.
I am looking for any advise on how to resolve this issue. I have a dataset consisting of Groups, with Sites and Site Names. From time to time a new site can accidently be allocated the same Site Code or Site Name as an existing site.
I am looking for a way to identify duplicate Site Codes and Site Names, as per the included PNG file. It can also be that the Site Codes are different and the Site Names are the same.
The “# Sites” expression is “RowNo(Total) ” and currently the expression to Identify the duplicate Site code is “Rangecount(Above(Count(Sitecode), 0, RowNo()))”.
I would be appreciate any suggestions on the issue, whether in a straight table or by creating a separate table.
Thank you – much appreciated.
Hi, you can add an expression with just Count(SiteCode), and sorted descendant by this expression will set the duplicated codes in the first rows, and/or add a red color background "If(Count(SiteCode)>1, LightRed())"
To do the check by group the count could be Count(Group &'_'& SiteCode).
The same can be done by SiteName.
You can use a combination of the Count() function and a Group By clause in your script.
Create a new table where you'll flag the duplicates.
// Load your initial data
SitesTable:
LOAD
Group,
SiteCode,
Company
FROM [YourDataSource];
// Create a table to flag duplicate Site Codes
DuplicateSiteCodes:
LOAD
SiteCode,
If(Count(SiteCode) > 1, 'Duplicate', 'Unique') as DuplicateCodeFlag
RESIDENT SitesTable
GROUP BY SiteCode;
// Create a table to flag duplicate Site Names
DuplicateCompany:
LOAD
Company,
If(Count(Company) > 1, 'Duplicate', 'Unique') as DuplicateNameFlag
RESIDENT SitesTable
GROUP BY Company;
****Hope this resolve your issue.
If the issue is solved please mark the answer with Accept as Solution & like it.****
If you want to do this in the script you can use exists() to identify duplicates.
Load
Sitecode, Company, Group,
Exists(Sitecode) as duplicate_site
From Source;
You could also try to use the window() function that was released in September 2023. Here is two alternatives depending on what you want to count.
Load
Window(count(Sitecode),Sitecode) as no_of_sitecodes,
Window(count( distinct Company),Sitecode) as no_of_unique_companies_per_sitecode,
*
;
Load
Sitecode, Company, Group
From Source;