9 Replies Latest reply: Apr 23, 2014 1:02 PM by Justin Kelly RSS

    Remove Duplicates in Script (cannot LOAD DISTINCT)

    Justin Kelly

      Hi Everybody!

       

      I have a problem which seems like it should have a simple fix, but I can't figure it out.

       

      I have some bad data where there are some records that are almost unique but not quite. What I need to do is get all distinct Full_Job_Numbers, and if the the Full_Job_Number repeats, then get the last record only. For example, if there is only one record for that Full_Job_Number, then my Expected Result table would include it. If there are three records with the same Full_Job_Number, then I would include only the third one in my Expected Result Table.

       

      I hope that is clear on what I'm trying to do. If the records were exactly the same then I know I could just do a Load Distinct but it's not quite that simple.

       

      Can anyone help?

       

      Thanks,

       

      Justin

       

       

      Data:

      Full_Job_NumberJob_NumberJob LineNameSales
      10000000-00110000000001Person1 $ 138,606
      20000000-00120000000001Person1 $ 147,589
      30000000-00130000000001Person1 $ 148,382
      40000000-00140000000001Person1 $ 112,908
      10000000-00110000000001Person2 $ 138,606
      30000000-00130000000001Person2 $ 148,382

       

      Expected Result:

      Full_Job_NumberJob_NumberJob LineNameSales
      1000000000110000000001Person2 $ 138,606
      2000000000120000000001Person1 $ 147,589
      3000000000130000000001Person2 $ 148,382
      4000000000140000000001Person1 $ 112,908