21 Replies Latest reply: Jan 6, 2017 6:15 AM by Sébastien Fatoux RSS

    How to remove row if two fields are the same

    jade wind

      I am dealing with a table with running records where one person should only appear once on a day. There are records where one person appear twice on a day. I need to remove these "duplicates" from the table before I can do any analysis. The other fields of the records might not be the same. I plan to keep only the records that have the latest Transaction ID. Eg. if a person has two records on the same day, one Transaction ID is 9011 the other is 9064, I want to keep 9064.

       

      How can I do it? Can I do it in the LOAD step or in expressions?

       

      Thanks in advance!

        • Re: How to remove row if two fields are the same
          Lech Miszkiewicz

          Hi

           

          i would do it in 2 smal steps during load script:

           

          1 step is to select transactions by aggregating data only to max Transaction Id for each person and then as i assume transaction id is unique you just left joining all transactions to it which will keep only initially loaded records.

           

          remember that in second step you have to use the same field name for transaction Id as in the first step so the join can work properly.

           

          Transactions:

          Load

               max(transId) as TransactionId

               person

          From

               Your Source

          Group by

               person

          ;

           

          Left Join (Transactions)

          Load

               *

          From

               Your Source

          ;

           

          kind regards

          Lech

            • Re: How to remove row if two fields are the same
              jade wind

              Thanks Lech.

               

              Yes the transaction IDs are unique. But I think your method will remove all the other dates for the same person as well?

               

              An example of my table would look like below (of course there are many other fields apart from these three fields). As you can see, one person can attend multiple times over different dates. What I want to remove is only the ones where they attend on the same day. So in below example, I only want to remove the record with TransID 9601, because on that day the person attended twice, and I only want to keep the latest record.

                 

              TransIDDatePerson ID
              961629/11/201641500
              960129/11/201641500
              715001/03/201641500
              957008/11/201644626
              817408/06/201644626
              674406/11/201544626
              392031/03/201544626
              197118/11/201444626
              846614/07/20162004647
              821320/06/20162004647
              770502/05/20162004647
              618607/10/20152004647
              561624/08/20152004647
              845427/07/20162005542
              824721/06/20162005542
              800517/05/20162005542
              624920/10/20152005542
              140330/10/20142005542

               

              How do I remove this record without removing all the other records with different dates?

               

              And is it possible to do it in expression instead of LOAD (as I might want to keep the original total number of records)?

               

              Thanks!

            • Re: How to remove row if two fields are the same
              Lech Miszkiewicz

              Hi, then include date in your first load. Just ot make sure it is not timestamp i would use floor(date to convert timestamps to dates) and also use Floor(Date on your second table) or don't use date at all on second one.

               

              Transactions:

              Load

                   max(transId)      as TransactionId,

                   Floor(Date)         as Date,

                   person

              From

                   Your Source

              Group by

                   person,

                   Floor(Date)

              ;

               

              Left Join (Transactions)

              Load

                   *

              From

                   Your Source

              ;

                • Re: How to remove row if two fields are the same
                  jade wind

                  Thanks Lech.

                   

                  I have tested your method and the record 9601 does seem to disappear from the TransactionId field. However this record doesn't seem to have been removed, but replaced by 9616, which means that there are two records marked as 9616. And the measures will look at both record instead of the original 9616 one. Eg. the data table is like this (I added an amount field so you can see how measures are done):

                    

                  TransIDDatePerson IDAmount
                  961629/11/2016415002000
                  960129/11/201641500600
                  715001/03/201641500700
                  957008/11/201644626800
                  817408/06/201644626900
                  674406/11/2015446261000
                  392031/03/2015446261100
                  197118/11/2014446261200
                  846614/07/201620046471300
                  821320/06/201620046471400
                  770502/05/201620046471500
                  618607/10/201520046471600
                  561624/08/201520046471700
                  845427/07/201620055421800
                  824721/06/201620055421900
                  800517/05/201620055422000
                  624920/10/201520055422100
                  140330/10/201420055422200

                   

                  Then when I look at average, Person ID 41500 would show the average of the three figures ((2000+600+700)/3=1100) instead of two figures ((2000+700)/2=1350). Please see below:

                   

                  1.JPG

                   

                  And also other measures show the same issue:

                  2.JPG

                   

                  I need record 9601 to be completely removed from all calculations. How do I do it?

                   

                  Thanks!

                • Re: How to remove row if two fields are the same
                  Lech Miszkiewicz

                  Hi, record 9601 should be removed. with this method, because join is done on Transaction id, person and date. As you know in first step we are removing that record.

                   

                   

                  please post your script so i can check where you is mistake.

                    • Re: How to remove row if two fields are the same
                      jade wind

                      I must have done something wrong then...Below are my LOAD script:

                       

                      Transactions:

                      Load

                           max(TransID) as TransactionId,

                           Date(Date,'DD/MM/YYYY') as Date,

                           [Person ID]

                      FROM [lib://test/test data.xlsx]

                      (ooxml, embedded labels, table is Sheet1)

                      Group by

                           [Person ID],

                           Date

                      ;

                       

                      Left Join (Transactions)

                      Load

                           *

                      FROM [lib://test/test data.xlsx]

                      (ooxml, embedded labels, table is Sheet1);

                        • Re: How to remove row if two fields are the same
                          Lech Miszkiewicz

                          just use this,

                           

                          will work:)

                           

                          Transactions:

                          Load

                               max(TransID) as TransactionId,

                              Floor(Date)      as FloorDate,

                               [Person ID]

                          FROM [lib://test/test data.xlsx]

                          (ooxml, embedded labels, table is Sheet1)

                          Group by

                               [Person ID],

                               Floor(Date) 

                          ;

                           

                          Left Join (Transactions)

                          Load

                               TransID as TransactionId  ,

                                Floor(Date)      as FloorDate,  

                          *

                          FROM [lib://test/test data.xlsx]

                          (ooxml, embedded labels, table is Sheet1);


                          Drop fields TransactionId , FloorDate;

                      • Re: How to remove row if two fields are the same
                        Tamil Nagaraj

                        Something like below,

                         

                        Data:
                        LOAD TransID,
                        Date(Date,'DD/MM/YYYY') as Date,
                        [Person ID],
                        Amount
                        FROM [lib://test/test data.xlsx]
                        (
                        ooxml, embedded labels, table is Sheet1);

                        Inner Join
                        LOAD [Person ID],
                        Date,
                        Max(TransID) as TransID
                        Resident Data Group by [Person ID], Date
                        ;


                        Below link might be helpful for you to understand about joins.


                        Understanding Join, Keep and Concatenate

                        • Re: How to remove row if two fields are the same
                          Andrey Khoronenko
                          Hi
                          For example, the source table
                          TransIDDatePersonID
                          961629/11/201641500
                          960129/11/201641500
                          715001/03/201641500
                          957008/11/201644626
                          817408/06/201644626
                          674406/11/201544626
                          392031/03/201544626
                          197118/11/201444626
                          846614/07/20162004647
                          821320/06/20162004647
                          770502/05/20162004647
                          618607/10/20152004647
                          561624/08/20152004647
                          845427/07/20162005542
                          824721/06/20162005542
                          800517/05/20162005542
                          624920/10/20152005542
                          140330/10/20142005542

                           

                          Below is a script code, which excludes TransID lower in the first two rows

                           

                          LOAD Date,

                               PersonID,

                               Max(TransID)

                          FROM

                            (the path to your table)

                          Group By Date, PersonID;


                          Regards


                          Andrey Kh

                          • Re: How to remove row if two fields are the same
                            Dipak Sawant

                            Duplicate_row.jpg

                            Hi Jade,

                             

                            You can refer below script. You will get Amount against latest Transid. Above screenshot is from Qlikview and In above screenshot we can see single amount against TransId and not the average of Amount based on Person.

                             

                            Data:

                             

                            LOAD Max(Transid) as Transaction,

                                Date,

                                Person

                            FROM

                            Duplicate_rows_removal.xlsx

                            (ooxml, embedded labels, table is Sheet1)

                            group by Person,Date;

                             

                             

                            Data2:

                             

                            Left Join(Data)

                             

                            LOAD  Transid as Transaction,

                                 Date,

                                 Amount

                            FROM

                            Duplicate_rows_removal.xlsx

                            (ooxml, embedded labels, table is Sheet1);

                            • Re: How to remove row if two fields are the same
                              Shubham Singh

                              Try this:

                               

                              //First load the table like a normal load script

                               

                              Stage1:

                              LOAD * FROM [lib://Folder/File.qvd];

                               

                              //Then do a order by something like this

                              Noconcatenate

                              Stage2:

                              LOAD * Resident Stage1

                              order by PersonID, Date, TransID Desc; //Descending because you want TransID with higher value

                               

                              Drop Table Stage1;

                               

                              //Here you flag your undesired records

                              Noconcatenate

                              Stage3:

                              LOAD

                              *,

                              if(peek(PersonID)=PersonID and peek(Date)=Date,1,0) as Flag

                              Resident Stage2;

                               

                              Drop Table Stage2:

                               

                              //Now remove those undesired records

                              Noconcatenate

                              Stage4:

                              LAOD * Resident Stage3

                              where Flag<>1;

                               

                              Drop Table Stage3;

                              Drop Field Flag;