Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
kaygee28
Contributor III
Contributor III

Checking for Duplicates

Good Day

I just want to verify whether this if statement will return the correct results I have a set of data for all vehicles and some ref numbers and customer names appear more than once, so I want to distinguish between repurchases (customers who have bought vehicles before) and first time buyers so I wrote this if statement in my script:

If((Contract_Ref_no) and (Customer_name)>1,'Repurchases','First Time Buyers') AS [Repurchases]

the reason I am skeptical is that there are more repurchases than first time buyers so I was wondering whether it might be the code for some reason.

Thanks in advance.

1 Solution

Accepted Solutions
swuehl
MVP
MVP

Seems to me that you want to aggregate record values ("Count duplicates"), so you would need a GROUP BY clause in your script:

LEFT JOIN (YourTable)

LOAD [Contract_Ref_No],

          [Customer_name],

          If( Count([Customer_name]) >1, 'Repurchases', 'First Time Buyers') as [Repurchases]

RESIDENT YourTable

GROUP BY [Contract_Ref_No], [Customer_name];

Maybe it's sufficient to only use [Contract_Ref_No] in above load instead of the two fields. And it may be better to check on something like OrderNo or OrderDate instead of name:

LEFT JOIN (YourTable)

LOAD [Contract_Ref_No],

          If( Count(DISTINCT OrderNo) >1, 'Repurchases', 'First Time Buyers') as [Repurchases]

RESIDENT YourTable

GROUP BY [Contract_Ref_No];

To make this work, all fields need to be part of YourTable, not of several distinct tables in your data model.

Hope this helps,

Stefan

View solution in original post

4 Replies
Gysbert_Wassenaar

If((Contract_Ref_no) and (Customer_name)>1,'Repurchases','First Time Buyers') AS [Repurchases]

I don't understand what you're trying to test. What is (Contract_Ref_no) and (Customer_name)>1 supposed to do? Contract_Ref_no is probably a number, so likely to be larger than 1, but are your Customer_name values numbers too?


talk is cheap, supply exceeds demand
swuehl
MVP
MVP

Seems to me that you want to aggregate record values ("Count duplicates"), so you would need a GROUP BY clause in your script:

LEFT JOIN (YourTable)

LOAD [Contract_Ref_No],

          [Customer_name],

          If( Count([Customer_name]) >1, 'Repurchases', 'First Time Buyers') as [Repurchases]

RESIDENT YourTable

GROUP BY [Contract_Ref_No], [Customer_name];

Maybe it's sufficient to only use [Contract_Ref_No] in above load instead of the two fields. And it may be better to check on something like OrderNo or OrderDate instead of name:

LEFT JOIN (YourTable)

LOAD [Contract_Ref_No],

          If( Count(DISTINCT OrderNo) >1, 'Repurchases', 'First Time Buyers') as [Repurchases]

RESIDENT YourTable

GROUP BY [Contract_Ref_No];

To make this work, all fields need to be part of YourTable, not of several distinct tables in your data model.

Hope this helps,

Stefan

kaygee28
Contributor III
Contributor III
Author

Hi Gysbert Wassenaar 

Contract_Ref_no is a unique Id each customer is given when they open an account and the customer_name is a string dependent on the name of the customer.

All i wanted to do is to check for duplicates if there is a contract_ref_no and a customer name that appears more than once then it will be regarded as a repurchase.

kaygee28
Contributor III
Contributor III
Author

Hi Swuel

I think it worked it looks more or less correct now first time buyer stats are 97% and Repurchases are at 3%, I would rather remain safe and keep the Contr_Ref_no as well as the Customer name as the key identifiers.

Thanks for your help.