Solved: Remove Duplicates - Qlik Community

hkg_qlik · ‎2012-08-16

Hi,

I have a table which holds two key fields: MemberID and CustomerID.

Before moving forward I want to check for any duplicates in table with a Key combination of MemberID + CustomerID. (Both reside in the same table)

Any thoughts?

Thanks,

H

Jason_Michaelides · ‎2012-08-16

Try aggregating the resident table and joining to itself:

Data:

LOAD

MemberID,

CustomerID,

1 AS Count

From.....;

JOIN (Data)

LOAD

MemberID,

CustomerID,

Sum(Count) AS Freq

Resident Data

Group by MemberID,CustomerID;

That will show you the frequency of each row.

Hope this helps,

Jason

View solution in original post

Jason_Michaelides · ‎2012-08-16

Try aggregating the resident table and joining to itself:

Data:

LOAD

MemberID,

CustomerID,

1 AS Count

From.....;

JOIN (Data)

LOAD

MemberID,

CustomerID,

Sum(Count) AS Freq

Resident Data

Group by MemberID,CustomerID;

That will show you the frequency of each row.

Hope this helps,

Jason

hkg_qlik · ‎2012-08-16

I want to flag the duplicates and remove them from the table.

Jason_Michaelides · ‎2012-08-16

I assume you want to leave one of each duplicate record behind? Will all other fields also be identical? If not, how do you decide which record to keep?

hkg_qlik · ‎2012-08-16

Yes all other fields would be identical. So I can keep any one record of the two or more.

Jason_Michaelides · ‎2012-08-16

Then maybe just LOAD DISTINCT in the first place....

hkg_qlik · ‎2012-08-16

So something like this:

Load Distinct MemberID&' '&CustomerID as Key,

MemberID,

CustomerID,

.

From

ABC.qvd

I want to check when the combination of MemberID and CustomerID is repeating in the table.

manojkvrajan · ‎2012-08-16

Script mentioned below brings only the data that has the combination > 1. I hope you can take it forward from here to avoid data duplication using the timestamp or anything specific to your requirements. Does it help?

Temp:

LOAD DISTINCT MemberID, CustomerID,
COUNT(MemberID&'_'&CustomerID) AS Count
FROM
(SourceFile.xls )
GROUP BY MemberID, CustomerID;

Key:

LOAD MemberID&'_'&CustomerID AS Key
RESIDENT Temp
WHERE Count >1;

DROP TABLE Temp;

ActualTable:

LOAD *
FROM
(SourceFile)
WHERE EXISTS (Key,MemberID&'_'&CustomerID);

DROP TABLE Key;

Jason_Michaelides · ‎2012-08-16

I meant simply:

LOAD DISTINCT * FROM Data.qvd(qvd);

Jason

hkg_qlik · ‎2012-08-17

Thank you all for your inputs.

Remove Duplicates

Related Topics