Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Streamlining user types in Qlik Cloud capacity-based subscriptions: Read the Details
cancel
Showing results for 
Search instead for 
Did you mean: 
LoKi_asterix
Contributor III
Contributor III

Identify matching records across multiple Fields

Hi all,

I need to identify duplicate (matching) records, preferably in the backend. A record should be treated as a duplicate when the following fields have the same values:

Date

CustomerID

AccountID

Reference

Narration

Amount

All these fields exist within a single large fact table. In the scenario I’m dealing with, the records are identical across all the listed fields, including the reference, and should therefore be considered duplicates.

What is the recommended approach or best practice in QlikView to identify or flag such duplicates?

 DateCustomeridAccount NumberRef  Amount
14/03/2025a0011000000000000000001154FT333ppt 5000
14/03/2025a0021000000000000000001154FT335ppt 5000
Labels (1)
2 Solutions

Accepted Solutions
rami_naassaoui
Contributor III
Contributor III

@LoKi_asterix  Hello, first give names to all your columns (use AS for example)

then use count(*) as following : 


// First, load your data
FactTable:
LOAD *,
Date & '|' & CustomerID & '|' & [Account Number] & '|' & Ref & '|' & Amount as DuplicateKey
FROM [your_source];

// Count occurrences of each combination
DuplicateCheck:
LOAD
DuplicateKey,
Count(*) as DuplicateCount
RESIDENT FactTable
GROUP BY DuplicateKey;

// Join back to mark duplicates
LEFT JOIN (FactTable)
LOAD
DuplicateKey,
If(DuplicateCount > 1, 1, 0) as IsDuplicate,
DuplicateCount
RESIDENT DuplicateCheck;

DROP TABLE DuplicateCheck;

View solution in original post

marcus_sommer

If you want to include/exclude the duplicates completely you could apply an approach like suggested from @rami_naassaoui. But if you want to keep one of the duplicates records you need additional measurements or another approach. Therefore you may try an ordering with interrecord-function, for example something like this:

load *,
        if(Date =previous(Date) and CustomerID = previous(CustomerID) and
          [Account Number] = previous([Account Number]) and
          Ref = previous(Ref) and Amount = previous(Amount),
        peek('Nr') + 1, 1) as Nr
resident X order by Date, CustomerID, [Account Number], Ref, Amount;

View solution in original post

2 Replies
rami_naassaoui
Contributor III
Contributor III

@LoKi_asterix  Hello, first give names to all your columns (use AS for example)

then use count(*) as following : 


// First, load your data
FactTable:
LOAD *,
Date & '|' & CustomerID & '|' & [Account Number] & '|' & Ref & '|' & Amount as DuplicateKey
FROM [your_source];

// Count occurrences of each combination
DuplicateCheck:
LOAD
DuplicateKey,
Count(*) as DuplicateCount
RESIDENT FactTable
GROUP BY DuplicateKey;

// Join back to mark duplicates
LEFT JOIN (FactTable)
LOAD
DuplicateKey,
If(DuplicateCount > 1, 1, 0) as IsDuplicate,
DuplicateCount
RESIDENT DuplicateCheck;

DROP TABLE DuplicateCheck;

marcus_sommer

If you want to include/exclude the duplicates completely you could apply an approach like suggested from @rami_naassaoui. But if you want to keep one of the duplicates records you need additional measurements or another approach. Therefore you may try an ordering with interrecord-function, for example something like this:

load *,
        if(Date =previous(Date) and CustomerID = previous(CustomerID) and
          [Account Number] = previous([Account Number]) and
          Ref = previous(Ref) and Amount = previous(Amount),
        peek('Nr') + 1, 1) as Nr
resident X order by Date, CustomerID, [Account Number], Ref, Amount;