Skip to main content
Announcements
Live today at 11 AM ET. Get your questions about Qlik Connect answered, or just listen in. SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
hkg_qlik
Creator III
Creator III

Remove Duplicates

Hi,

I have a table which holds two key fields: MemberID and CustomerID.

Before moving forward I want to check for any duplicates in table with a Key combination of MemberID + CustomerID. (Both reside in the same table)

Any thoughts?

Thanks,

H

1 Solution

Accepted Solutions
Jason_Michaelides
Luminary Alumni
Luminary Alumni

Try aggregating the resident table and joining to itself:

Data:

LOAD

   MemberID,

   CustomerID,

   1 AS Count

From.....;

JOIN (Data)

LOAD

   MemberID,

   CustomerID,

   Sum(Count) AS Freq

Resident Data

Group by MemberID,CustomerID;

That will show you the frequency of each row.

Hope this helps,

Jason

View solution in original post

13 Replies
Jason_Michaelides
Luminary Alumni
Luminary Alumni

Try aggregating the resident table and joining to itself:

Data:

LOAD

   MemberID,

   CustomerID,

   1 AS Count

From.....;

JOIN (Data)

LOAD

   MemberID,

   CustomerID,

   Sum(Count) AS Freq

Resident Data

Group by MemberID,CustomerID;

That will show you the frequency of each row.

Hope this helps,

Jason

hkg_qlik
Creator III
Creator III
Author

I want to flag the duplicates and remove them from the table.

Jason_Michaelides
Luminary Alumni
Luminary Alumni

I assume you want to leave one of each duplicate record behind? Will all other fields also be identical? If not, how do you decide which record to keep?

hkg_qlik
Creator III
Creator III
Author

Yes all other fields would be identical. So I can keep any one record of the two or more.

Jason_Michaelides
Luminary Alumni
Luminary Alumni

Then maybe just LOAD DISTINCT in the first place....

hkg_qlik
Creator III
Creator III
Author

So something like this:

Load Distinct MemberID&' '&CustomerID as Key,

MemberID,

CustomerID,

.

.

.

From

ABC.qvd

I want to check when the combination of MemberID and CustomerID is repeating in the table.

manojkvrajan
Luminary
Luminary

Script mentioned  below brings only the data that has the combination > 1. I hope you can take it forward from here to avoid data duplication using the timestamp or anything specific to your requirements. Does it help?

Temp:


LOAD DISTINCT MemberID, CustomerID,
COUNT(MemberID&'_'&CustomerID) AS Count
FROM
(SourceFile.xls )
GROUP BY MemberID, CustomerID;

Key:


LOAD MemberID&'_'&CustomerID AS Key
RESIDENT Temp
WHERE Count >1;


DROP TABLE Temp;

ActualTable:

LOAD *
FROM
(SourceFile)
WHERE EXISTS (Key,MemberID&'_'&CustomerID);

DROP TABLE Key;

Jason_Michaelides
Luminary Alumni
Luminary Alumni

I meant simply:

LOAD DISTINCT * FROM Data.qvd(qvd);

Jason

hkg_qlik
Creator III
Creator III
Author

Thank you all for your inputs.