Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
hkg_qlik
Creator III
Creator III

Remove Duplicates

Hi,

I have a table which holds two key fields: MemberID and CustomerID.

Before moving forward I want to check for any duplicates in table with a Key combination of MemberID + CustomerID. (Both reside in the same table)

Any thoughts?

Thanks,

H

1 Solution

Accepted Solutions
Jason_Michaelides
Luminary Alumni
Luminary Alumni

Try aggregating the resident table and joining to itself:

Data:

LOAD

   MemberID,

   CustomerID,

   1 AS Count

From.....;

JOIN (Data)

LOAD

   MemberID,

   CustomerID,

   Sum(Count) AS Freq

Resident Data

Group by MemberID,CustomerID;

That will show you the frequency of each row.

Hope this helps,

Jason

View solution in original post

13 Replies
Jason_Michaelides
Luminary Alumni
Luminary Alumni

Try aggregating the resident table and joining to itself:

Data:

LOAD

   MemberID,

   CustomerID,

   1 AS Count

From.....;

JOIN (Data)

LOAD

   MemberID,

   CustomerID,

   Sum(Count) AS Freq

Resident Data

Group by MemberID,CustomerID;

That will show you the frequency of each row.

Hope this helps,

Jason

hkg_qlik
Creator III
Creator III
Author

I want to flag the duplicates and remove them from the table.

Jason_Michaelides
Luminary Alumni
Luminary Alumni

I assume you want to leave one of each duplicate record behind? Will all other fields also be identical? If not, how do you decide which record to keep?

hkg_qlik
Creator III
Creator III
Author

Yes all other fields would be identical. So I can keep any one record of the two or more.

Jason_Michaelides
Luminary Alumni
Luminary Alumni

Then maybe just LOAD DISTINCT in the first place....

hkg_qlik
Creator III
Creator III
Author

So something like this:

Load Distinct MemberID&' '&CustomerID as Key,

MemberID,

CustomerID,

.

.

.

From

ABC.qvd

I want to check when the combination of MemberID and CustomerID is repeating in the table.

manojkvrajan
Luminary
Luminary

Script mentioned  below brings only the data that has the combination > 1. I hope you can take it forward from here to avoid data duplication using the timestamp or anything specific to your requirements. Does it help?

Temp:


LOAD DISTINCT MemberID, CustomerID,
COUNT(MemberID&'_'&CustomerID) AS Count
FROM
(SourceFile.xls )
GROUP BY MemberID, CustomerID;

Key:


LOAD MemberID&'_'&CustomerID AS Key
RESIDENT Temp
WHERE Count >1;


DROP TABLE Temp;

ActualTable:

LOAD *
FROM
(SourceFile)
WHERE EXISTS (Key,MemberID&'_'&CustomerID);

DROP TABLE Key;

Jason_Michaelides
Luminary Alumni
Luminary Alumni

I meant simply:

LOAD DISTINCT * FROM Data.qvd(qvd);

Jason

hkg_qlik
Creator III
Creator III
Author

Thank you all for your inputs.