Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
simonwithey
Contributor
Contributor

identifying Duplicates

As a Qlik virgin I need the help of you experts. I need to identify and set a flag against a duplicate record in a load script. Example ogf the table below. I need to check if the Id's are the same and if so i need to set a flag against the oldest date only and set the later date as OK. If the dates are equal i just need to set the first record as a duplicate. many thanks

  

IdAuth Date
C001833719-Jul-17
C001833727-Jul-17
CX21179502-Jun-17
CX21179502-Jun-17
CX21219808-Jun-17
CX21219809-Jun-17
N000248808-Jun-17
N000248814-Jun-17
P000074012-Jun-17
P000074016-Jun-17
R000024906-Jul-17
R000024925-Jul-17
19 Replies
sunny_talwar

May be this

SET DateFormat='DD-MMM-YY';

Table:

LOAD * INLINE [

    Id, Auth Date

    C0018337, 19-Jul-17

    C0018337, 27-Jul-17

    CX211795, 02-Jun-17

    CX211795, 02-Jun-17

    CX212198, 08-Jun-17

    CX212198, 09-Jun-17

    N0002488, 08-Jun-17

    N0002488, 14-Jun-17

    P0000740, 12-Jun-17

    P0000740, 16-Jun-17

    R0000249, 06-Jul-17

    R0000249, 25-Jul-17

];

FinalTable:

LOAD Id,

[Auth Date],

If(Id = Previous(Id), 'Duplicate') as Flag

Resident Table

Order By Id, [Auth Date] desc;

DROP Table Table;

Capture.PNG

dan_sullivan
Creator II
Creator II

This is good advice from Sunny.  you can also do a load distinct to just eliminate the duplicates where they have the same dates and Id.

effinty2112
Master
Master

Hi Simon,

Another way:

Data:

LOAD * Inline [

Id, Auth Date

C0018337, 19-Jul-17

C0018337, 27-Jul-17

CX211795, 02-Jun-17

CX211795, 02-Jun-17

CX212198, 08-Jun-17

CX212198, 09-Jun-17

N0002488, 08-Jun-17

N0002488, 14-Jun-17

P0000740, 12-Jun-17

P0000740, 16-Jun-17

R0000249, 06-Jul-17

R0000249, 25-Jul-17

];

Left Join(Data)

LOAD

Id,

min([Auth Date],2) as [Auth Date],

'OK' as Flag

Resident Data Group by Id;

gives:

Id Auth Date Flag
C001833727-Jul-17OK
C001833719-Jul-17 
CX21179502-Jun-17 
CX21219809-Jun-17OK
CX21219808-Jun-17 
N000248814-Jun-17OK
N000248808-Jun-17 
P000074016-Jun-17OK
P000074012-Jun-17 
R000024925-Jul-17OK
R000024906-Jul-17

cheers

Andrew

dan_sullivan
Creator II
Creator II

Simon can you mark the answer as correct and close thread if you are satisfied with the results?

simonwithey
Contributor
Contributor
Author

I will once I've had chance to have a play. I am a complete novice so I'm a bit slow at the moment. What I'm not sure about the responses is that the table will contain over 20,000 records (of which only about 12 will be duplicates) and the the two fields in question will not be sorted so I'm not sure if this will be an issue with the above solutions.

simonwithey
Contributor
Contributor
Author

Thank you Sunny. Apologies for not making myself clear. I won't be typing the data in, it will be part of a non sorted data table/file.

simonwithey
Contributor
Contributor
Author

Thank you Sunny. Apologies for not making myself clear. I won't be typing the data in, it will be part of a non sorted data table/file.

sunny_talwar

We know that.... the above was just to show how it can be done.... in your case, you will do something like this

SET DateFormat='DD-MMM-YY';

Table:

LOAD *

FROM .....;

FinalTable:

LOAD Id,

[Auth Date],

If(Id = Previous(Id), 'Duplicate') as Flag,

....

Resident Table

Order By Id, [Auth Date] desc;

DROP Table Table;

Red part is where you will load the data from whatever you are loading it from

simonwithey
Contributor
Contributor
Author

Thanks Sunny, I'm still having issues. I've tried copying your syntax but I'm obviously doing something wrong.

The table is already loaded (called REP). Can I ask you to spell out the red text in idiot terms for me please. As soon as I type FROM [REP] it doesn't like it. Sorry to be a pain