3 Replies Latest reply: Jul 31, 2017 3:54 AM by Paloma Mura RSS

    delete duplicate rows

    Paloma Mura

      Good afternoon,

       

      I want to compare rows from my dataset to delete duplicate rows from particular columns.

      I've already had an answer from a previous question (named : compare rows of dataset)

       

      Below what I wanted to do and the answer I had :

      {

      For example, I have the table below :

      A    B    C                              D

      1    1B    23/03/2017 12:12    344

      2    2B    23/03/2017 11:50    20

      3    1B    23/03/2017 12:12    344

      4    3B    23/03/2017 09:11    32

      The expected result would be to add a new column E like this :

      A    B    C                              D      E

      1    1B    23/03/2017 12:12    344    1

      2    2B    23/03/2017 11:50    20      1

      3    1B    23/03/2017 12:12    344    0

      4    3B    23/03/2017 09:11    32        1


      The solution :

      Table:

      LOAD *,

        AutoNumber(B&D) as Key;

      LOAD * INLINE [

          A, B, C, D

          1, 1B, 23/03/2017 12:12, 344

          2, 2B, 23/03/2017 11:50, 20

          3, 1B, 23/03/2017 18:00, 344

          4, 3B, 23/03/2017 09:11, 32

      ];

       

      FinalTable:

      LOAD *,

        If(Key = Previous(Key), 0, 1) as E

      Resident Table

      Order By Key, C desc;

       

      DROP Table Table;

      }


      Or now, I have a new column E.

      Here the table I have at the beginning :

      A    B    C                              D               E

      1    1B    23/03/2017 12:12    344     ACCEPTED

      2    2B    23/03/2017 11:50    20     ACCEPTED

      3    1B    23/03/2017 14:15   344     ACCEPTED

      4    3B    23/03/2017 09:11    32      REFUSED

      5     4B   23/03/2017 15:00   344     REFUSED

      6     4B   23/03/2017 15:01  344     ERROR

      7     4B   23/03/2017 15:02   344     ACCEPTED


      With the previous solution I will have :

      A    B    C                              D               E                    F

      1    1B    23/03/2017 12:12    344     ACCEPTED     0

      2    2B    23/03/2017 11:50    20     ACCEPTED     0

      3    1B    23/03/2017 14:15   344     ACCEPTED     0

      4    3B    23/03/2017 09:11    32      REFUSED     1

      5     4B   23/03/2017 15:00   344     REFUSED     0

      6     4B   23/03/2017 15:01   344     ERROR     0

      7     4B   23/03/2017 15:02   344     ACCEPTED     1


      Or I would like :

      A    B    C                              D               E                    F

      1    1B    23/03/2017 12:12    344     ACCEPTED     1

      2    2B    23/03/2017 11:50    20     ACCEPTED     1

      3    1B    23/03/2017 14:15   344     ACCEPTED     1

      4    3B    23/03/2017 09:11    32      REFUSED     1

      5     4B   23/03/2017 15:00   344     REFUSED     0

      6     4B   23/03/2017 15:01   344     ERROR     0

      7     4B   23/03/2017 15:02   344     ACCEPTED          1

      Indeed, I want to put 0 only if there was a first row with ERROR or REFUSED in column E.


      I don't know if my explanations are very clear.


      Thank you for your help.

      Paloma

        • Re: delete duplicate rows
          Chris Wong

          Table:

          LOAD *,

            AutoNumber(B&D) as Key;

          LOAD * INLINE [

          A,B,C,D,E

          1,1B,23/03/201712:12,344,ACCEPTED

          2,2B,23/03/201711:50,20,ACCEPTED

          3,1B,23/03/201714:15,344,ACCEPTED

          4,3B,23/03/201709:11,32,REFUSED

          5,4B,23/03/201715:00,344,REFUSED

          6,4B,23/03/201715:01,344,ERROR

          7,4B,23/03/201715:02,344,ACCEPTED

          ];

           

          FinalTable:

          LOAD *,

            If(Key = Previous(Key) and ([E] = 'ERROR' or [E] = 'REFUSED'), 0, 1) as F

          Resident Table

          Order By Key, C desc;

           

          DROP Table Table;

           

          1.JPG

          • Re: delete duplicate rows
            Andrey Khoronenko

            Hi Paloma,

             

            May be like this?

             

            Load*,

            If(E='ACCEPTED', 1, 0);

            Load*Inline

            [A,    B,    C,                              D,              E

            1,    1B,    23/03/2017 12:12,    344,    ACCEPTED

            2,    2B,    23/03/2017 11:50,    20,    ACCEPTED

            3,    1B,    23/03/2017 14:15,  344,    ACCEPTED

            4,    3B,    23/03/2017 09:11,    32,      REFUSED

            5,    4B,  23/03/2017 15:00,  344,    REFUSED

            6,    4B,  23/03/2017 15:01,  344,    ERROR

            7,    4B,  23/03/2017 15:02,  344,    ACCEPTED

            ];

             

            Confuses only that in your example this line should have a value of 0 if the condition is important for you - 'Indeed, I want to put 0 only if there was a first row with ERROR or REFUSED in column E.'.

             

            4    3B    23/03/2017 09:11    32      REFUSED  0


            Regards,

            Andrey

            • Re: delete duplicate rows
              Paloma Mura

              Thank you very much for your help ! This is what I want to do.