2 Replies Latest reply: Oct 28, 2013 5:48 PM by Celambarasan Adhimulam RSS

    Delete duplicate records

      How would I go about deleting duplicate records in a table based on ssn and hire date?

      The delete operator doesn't seem to work as I would expect it to.

       

      Thanks Traci

        • Re: Delete duplicate records
          Hugo Sheng

          Are the duplicates exact duplicates?  Or is there a unique identifier column? 

           

          If there's a unique identifier, you will need to setup a dataflow to extract all records, run it through the Unique operator and identify the duplicates, and then use the Write Table operator with delete mode and specify that unique identifier.

           

          If the records are entirely identical (all columns have same value), then your best bet is to make a copy of the original table, extract all records, sort and pass through the Unique operator, and truncate and load back into the original table.

          • Re: Delete duplicate records
            Celambarasan Adhimulam

            You could use distinct.

            Create a flag by using Previous or Peek function and then filter it based on the flag.