Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
nwilliams
Contributor
Contributor

Numbering "Duplicate" Records

Evening,

I'm trying to identify "duplicate" records in my dataset.  I am concatenating 3 QVD's into 1 table.  The script I'm using is below:

AllActiveAccounts.account,
    
AllActiveAccounts.agentversion,
    
AllActiveAccounts.backupdate,
    
if(AllActiveAccounts.Email='','None', AllActiveAccounts.Email) AS AllActiveAccounts.Email,
    
if(AllActiveAccounts.Email='','None', AllActiveAccounts.Email) AS lnk_user_email,
    
AllActiveAccounts.startdate,
    
AllActiveAccounts.value,
    
AllActiveAccounts.value  AS lnk_asset_name,
     'UK'
AS AllActiveAccounts.c_Country,
     '8.5'
AS AllActiveAccounts.c_CBVersion,
    
Num((Date(Today()))-AllActiveAccounts.backupdate,'#.') AS AllActiveAccounts.c_DaysSinceBackup

I want to identify a duplicate as a record that has the same "AllActiveAccounts.Email" and the same "AllActiveAccounts.value,"

For instance, if the combination of the email aa@bb.com and value "12234" appeared twice in the data set, the number wold be "2"

If there was another record with the email aa@bb.com and the value of 98765 that would appear as "1" despite there being another record with the same email address.

I have tried "Autonumber" but not having much luck. 

Can anyone offer any assistance?

Thanks

1 Solution

Accepted Solutions
maxgro
MVP
MVP


- load your qvd in a table (I call the table t)

- group by and join in this way to add the dupl field


left join (t)

load

AllActiveAccounts.Email,

AllActiveAccounts.value,

count(AllActiveAccounts.account) as dupl

Resident t

group by

     AllActiveAccounts.Email,

     AllActiveAccounts.value;


View solution in original post

4 Replies
maxgro
MVP
MVP

maybe like this?

1.png

t:

load rowno() as id, * inline [

email, value, f3, f4

aa@bb.com,1234,1,1

aa@bb.com,1,2,2

aa@bb.com,2,3,3

aa@bb.com,3,4,4

aa@bb.com,4,5,5

cc@dd.com,56,1,1

aa@bb.com,1234,11,11

aa@bb.com,1234,12,12

cc@dd.com,56,1,1

];

left join (t)

load email,value,count(f3) as dupl

Resident t

group by email,value;

nwilliams
Contributor
Contributor
Author

How would I do this without specifying the "email" and "value"?  I have about 19,000 records I'm trying to process.

maxgro
MVP
MVP


- load your qvd in a table (I call the table t)

- group by and join in this way to add the dupl field


left join (t)

load

AllActiveAccounts.Email,

AllActiveAccounts.value,

count(AllActiveAccounts.account) as dupl

Resident t

group by

     AllActiveAccounts.Email,

     AllActiveAccounts.value;


nwilliams
Contributor
Contributor
Author

Brilliant!!!! Thank you