Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Good morning.
I would like to identify potential duplicate records, excluding certain "Merchants". I already have a pretty good ETL process in place but would like to confirm there are no duplicate 'Transaction Numbers'. This becomes tricky because there are 2 Merchants I would like to always show their 'Transaction Numbers' to be marked as 'N' because they are from a internal source and I can trust them.
Looking at the below script I reference Resident t2. This is a preceding table. What I am trying to get to is, if the Merchant is 'FLEET SERVICES' or 'FIN', their 'Duplicate Record' must always be 'N'. For all other Merchants check if the current 'Transaction Number' is the same as the Previous Transaction Number. If it is the same, mark as 'Y' else 'N'.
If am using the below logic and it is giving me what I need, BUT, I would like to know if this the correct way or if there is a more correct way.
DAILY_DATA:
LOAD *,
If(Merchant = 'FLEET SERVICES' OR Merchant = 'FIN' ,'N',
If([Transaction Number]=Previous([Transaction Number]),'Y','N') ) as [Duplicate Record]
Resident t2
Order by [Transaction Number] ASC;
DROP Table t2;
Thank you for your insights.
Why don't you split the load so you only do the check on the merchents you don't trust with a where clause then join with the merchants you do trust at the end. This should process faster as you no longer need the if function to filter out the merchants.
Regards
Andy
Thanks for the input. That could work. I will give this a try.
Regards,
John
No problem hope this helps