Duplicate data removal

Report Inappropriate Content · ‎2016-10-04

I have below scenario:

Initialy user loads data from one excel which holds on key field: OrderID. Each row has an unique key OrderID. From time to time, in future, user loads the excel again but this excel either can be a completely new one (not containing any data from the previous load) or it can contain previous data loaded but with some new values added/updated.

So I would like to check for any duplicates with a unique key OrderID. All other fields can be or not identical. I want to leave one of each duplicate record. In case same OrderID but with different values for some of the rest fields, I want to keep the new record (not the old one previously loaded).

I want to do something similar to:

Remove Duplicates

But in that case all other fields are identical.

Also I want to do the same with another excel file but in this case the key is a combination of two fields in the excel: OrderID & ActNumber.

How can I get rid of this?

sunny_talwar · ‎2016-10-04

I think to do what you have described, you would need to use Where Exists statement. This is carried out similarly in Incremental loads also. May be check out few examples of how Where Exists is used in there.

Related Topics