Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
I need to identify duplicate (matching) records, preferably in the backend. A record should be treated as a duplicate when the following fields have the same values:
Date
CustomerID
AccountID
Reference
Narration
Amount
All these fields exist within a single large fact table. In the scenario I’m dealing with, the records are identical across all the listed fields, including the reference, and should therefore be considered duplicates.
What is the recommended approach or best practice in QlikView to identify or flag such duplicates?
| Date | Customerid | Account Number | Ref | Amount | ||
| 14/03/2025 | a001 | 1000000000000000001 | 154FT333 | ppt | 5000 | |
| 14/03/2025 | a002 | 1000000000000000001 | 154FT335 | ppt | 5000 |
@LoKi_asterix Hello, first give names to all your columns (use AS for example)
then use count(*) as following :
// First, load your data
FactTable:
LOAD *,
Date & '|' & CustomerID & '|' & [Account Number] & '|' & Ref & '|' & Amount as DuplicateKey
FROM [your_source];
// Count occurrences of each combination
DuplicateCheck:
LOAD
DuplicateKey,
Count(*) as DuplicateCount
RESIDENT FactTable
GROUP BY DuplicateKey;
// Join back to mark duplicates
LEFT JOIN (FactTable)
LOAD
DuplicateKey,
If(DuplicateCount > 1, 1, 0) as IsDuplicate,
DuplicateCount
RESIDENT DuplicateCheck;
DROP TABLE DuplicateCheck;
If you want to include/exclude the duplicates completely you could apply an approach like suggested from @rami_naassaoui. But if you want to keep one of the duplicates records you need additional measurements or another approach. Therefore you may try an ordering with interrecord-function, for example something like this:
load *,
if(Date =previous(Date) and CustomerID = previous(CustomerID) and
[Account Number] = previous([Account Number]) and
Ref = previous(Ref) and Amount = previous(Amount),
peek('Nr') + 1, 1) as Nr
resident X order by Date, CustomerID, [Account Number], Ref, Amount;
@LoKi_asterix Hello, first give names to all your columns (use AS for example)
then use count(*) as following :
// First, load your data
FactTable:
LOAD *,
Date & '|' & CustomerID & '|' & [Account Number] & '|' & Ref & '|' & Amount as DuplicateKey
FROM [your_source];
// Count occurrences of each combination
DuplicateCheck:
LOAD
DuplicateKey,
Count(*) as DuplicateCount
RESIDENT FactTable
GROUP BY DuplicateKey;
// Join back to mark duplicates
LEFT JOIN (FactTable)
LOAD
DuplicateKey,
If(DuplicateCount > 1, 1, 0) as IsDuplicate,
DuplicateCount
RESIDENT DuplicateCheck;
DROP TABLE DuplicateCheck;
If you want to include/exclude the duplicates completely you could apply an approach like suggested from @rami_naassaoui. But if you want to keep one of the duplicates records you need additional measurements or another approach. Therefore you may try an ordering with interrecord-function, for example something like this:
load *,
if(Date =previous(Date) and CustomerID = previous(CustomerID) and
[Account Number] = previous([Account Number]) and
Ref = previous(Ref) and Amount = previous(Amount),
peek('Nr') + 1, 1) as Nr
resident X order by Date, CustomerID, [Account Number], Ref, Amount;