Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to remove row if two fields are the same

I am dealing with a table with running records where one person should only appear once on a day. There are records where one person appear twice on a day. I need to remove these "duplicates" from the table before I can do any analysis. The other fields of the records might not be the same. I plan to keep only the records that have the latest Transaction ID. Eg. if a person has two records on the same day, one Transaction ID is 9011 the other is 9064, I want to keep 9064.

How can I do it? Can I do it in the LOAD step or in expressions?

Thanks in advance!

21 Replies
tamilarasu
Champion
Champion

Great Jade. Did you manage to get the correct results now?

Lech_Miszkiewicz
Partner Ambassador/MVP
Partner Ambassador/MVP

just use this,

will work:)

Transactions:

Load

     max(TransID) as TransactionId,

    Floor(Date)      as FloorDate,

     [Person ID]

FROM [lib://test/test data.xlsx]

(ooxml, embedded labels, table is Sheet1)

Group by

     [Person ID],

     Floor(Date) 

;

Left Join (Transactions)

Load

     TransID as TransactionId  ,

      Floor(Date)      as FloorDate,  

*

FROM [lib://test/test data.xlsx]

(ooxml, embedded labels, table is Sheet1);


Drop fields TransactionId , FloorDate;

cheers Lech, When applicable please mark the correct/appropriate replies as "solution" (you can mark up to 3 "solutions". Please LIKE threads if the provided solution is helpful to the problem.
Lech_Miszkiewicz
Partner Ambassador/MVP
Partner Ambassador/MVP

remember that if Date field is timestamp using Date(Date,'DD/MM/YYYY') as Date, wil not work, you have to use Floor(Date) as Date

cheers guys

cheers Lech, When applicable please mark the correct/appropriate replies as "solution" (you can mark up to 3 "solutions". Please LIKE threads if the provided solution is helpful to the problem.
tamilarasu
Champion
Champion

Hi Lech,


I agree with your point. I assumed Jade's file has only the date field i.e DD/MM/YYYY (No timestamp) and provided the solution. So this is purely based on my assumption. Thanks for pointing out though. Have a good day.

Lech_Miszkiewicz
Partner Ambassador/MVP
Partner Ambassador/MVP

No worries.

Hopefuly it works for Jade finaly.

Cheers guys-i am starting weekend here Down Under

cheers Lech, When applicable please mark the correct/appropriate replies as "solution" (you can mark up to 3 "solutions". Please LIKE threads if the provided solution is helpful to the problem.
tamilarasu
Champion
Champion

Ohhh ok Lech . Have a nice evening and a happy weekend.

ahaahaaha
Partner - Master
Partner - Master

Hi
For example, the source table
TransIDDatePersonID
961629/11/201641500
960129/11/201641500
715001/03/201641500
957008/11/201644626
817408/06/201644626
674406/11/201544626
392031/03/201544626
197118/11/201444626
846614/07/20162004647
821320/06/20162004647
770502/05/20162004647
618607/10/20152004647
561624/08/20152004647
845427/07/20162005542
824721/06/20162005542
800517/05/20162005542
624920/10/20152005542
140330/10/20142005542

Below is a script code, which excludes TransID lower in the first two rows

LOAD Date,

     PersonID,

     Max(TransID)

FROM

  (the path to your table)

Group By Date, PersonID;


Regards


Andrey Kh

Not applicable
Author

Duplicate_row.jpg

Hi Jade,

You can refer below script. You will get Amount against latest Transid. Above screenshot is from Qlikview and In above screenshot we can see single amount against TransId and not the average of Amount based on Person.

Data:

LOAD Max(Transid) as Transaction,

    Date,

    Person

FROM

Duplicate_rows_removal.xlsx

(ooxml, embedded labels, table is Sheet1)

group by Person,Date;

Data2:

Left Join(Data)

LOAD  Transid as Transaction,

     Date,

     Amount

FROM

Duplicate_rows_removal.xlsx

(ooxml, embedded labels, table is Sheet1);

sfatoux72
Partner - Specialist
Partner - Specialist

‌HI,

if you want to keep all your original data and filter them on expressions, you need to add a flag.

Transactions:

Load *

FROM [lib://test/test data.xlsx]

(ooxml, embedded labels, table is Sheet1);


Left Join (Transactions)

Load

     max(TransID) as TransID,

     Date(Date,'DD/MM/YYYY') as Date,

     [Person ID],

      1 as _LDT

Resident Transactions

Group by

     [Person ID],

     Date

;

I use _LDT for Last Daily Transaction and you need to use it in your expressions with set analysis.

Like that for example:

Sum({$<_LDT={1}>} Amount)

Lech_Miszkiewicz
Partner Ambassador/MVP
Partner Ambassador/MVP

Yes - good point. that would give you full picture how your data is working.

Lets hear from Jade whether he has finaly got it working.

regards

Lech

-------------------------------------------------------------------------------------------------------

Please mark CORRECT answer to help others an close already solved topics

cheers Lech, When applicable please mark the correct/appropriate replies as "solution" (you can mark up to 3 "solutions". Please LIKE threads if the provided solution is helpful to the problem.