Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
fietzn
Contributor III
Contributor III

Remove Duplicates That Aren't Actually Duplicates

Hi,

I'm trying to figure out how to remove duplicates that aren't truly duplicates. In the attached data set, I only want to return the row with the latest 'Action Date Time' for a given Login Name, Case ID, Alert ID and Action Date combination. I can't use the Load Distinct command because I need to keep the Action Text field, which is unique for each record.

I thought I could:

  1. Create an index field based on Login Name, Case ID, Alert ID and Action Date
  2. Keep only the min index value based on those combos and
  3. Join that back to all the records to return the rest of the fields based on index

However, with both the FirstSortedValue and Min functions, Qlik is still returning all index values as shown in the attached image. Any help would be appreciated.

Thanks.

Labels (1)
1 Solution

Accepted Solutions
fietzn
Contributor III
Contributor III
Author

I was able to find an alternate solution by generating record numbers, and then right joining to a table where I took only the max record number for each respective key. Below is the code.

TEMP_DATA:

LOAD distinct
    RecNo() as REC_NBR,
	"Case ID",
    "Alert ID",
    "Action Date Time",
    "Create Date*",
    "Login Name",
    "FI Name",
    "Create Date Time",
    "Action Text",
    "Action Date*",
    "Close or Pend*",
	
    //Create Key
    Date(Floor(Timestamp#("Action Date*", 'MM/DD/YY hh:mm')), 'YYYY/MM/DD') &'|'& "Alert ID" &'|'& "Case ID" &'|'& "Login Name" as KEY

FROM [lib://AttachedFiles/SampleData (2).xlsx]
(ooxml, embedded labels, table is Input)


NoConcatenate
//Take MAX REC_NBR for KEY

[MAX_REC_NBR]:
LOAD
KEY,
max(REC_NBR) as MAX_REC_NBR

RESIDENT TEMP_DATA
Group By KEY;

Left Join (MAX_REC_NBR)

LOAD *,
REC_NBR as MAX_REC_NBR,
KEY as MAX_REC_KEY

RESIDENT TEMP_DATA;

DROP fields REC_NBR,KEY ;
DROP table TEMP_DATA;

exit script;

View solution in original post

5 Replies
vinieme12
Champion III
Champion III

using your sampledata

LOAD
    "Case ID",
    "Alert ID",
    "Action Date Time",
    "Create Date*",
    "Login Name",
    "FI Name",
    "Create Date Time",
    "Action Text",
    "Action Date*",
    "Close or Pend*",
    AutoNumber("Case ID"&"Alert ID"&"Action Text","Case ID"&"Alert ID") as Index
FROM [lib://AttachedFiles/SampleData (2).xlsx]
(ooxml, embedded labels, table is Input);


NoConcatenate
Load * Resident temp 
Where Index=1;
drop table temp;


exit Script;

 

vinieme12_0-1671503812771.png

 

Vineeth Pujari
If a post helps to resolve your issue, please accept it as a Solution.
fietzn
Contributor III
Contributor III
Author

Thank you @vinieme12. I believe this gets me 90% there. When doing some quality checks on my larger data set, I noticed an example where I would have expected the index to increment but it didn't, as shown in the image below. 

I thought it would be a matter of changing the AutoID argument of the AutoNumber function, but making the tweak shown below didn't fix it. I've been struggling to understand the AutoID argument, so any advice on this is appreciated.

Thanks.

AutoNumber("Action Date*"&LOGIN_NAME&ALERT_ID&ACTION_TEXT, caseId&ALERT_ID&ACTION_DATE) as "Index*"


2022-12-21_9-14-07.png

 

 

vinieme12
Champion III
Champion III

Remove actiondate from autoid param

 

AutoNumber("Action Date*"&LOGIN_NAME&ALERT_ID&ACTION_TEXT, caseId&ALERT_ID) as "Index*"

Vineeth Pujari
If a post helps to resolve your issue, please accept it as a Solution.
fietzn
Contributor III
Contributor III
Author

The output is actually the same if I have the AutoId parameter as caseId&ALERT_ID or caseId&ALERT_ID&ACTION_DATE. Both index the 2 line items as a "1". 

fietzn
Contributor III
Contributor III
Author

I was able to find an alternate solution by generating record numbers, and then right joining to a table where I took only the max record number for each respective key. Below is the code.

TEMP_DATA:

LOAD distinct
    RecNo() as REC_NBR,
	"Case ID",
    "Alert ID",
    "Action Date Time",
    "Create Date*",
    "Login Name",
    "FI Name",
    "Create Date Time",
    "Action Text",
    "Action Date*",
    "Close or Pend*",
	
    //Create Key
    Date(Floor(Timestamp#("Action Date*", 'MM/DD/YY hh:mm')), 'YYYY/MM/DD') &'|'& "Alert ID" &'|'& "Case ID" &'|'& "Login Name" as KEY

FROM [lib://AttachedFiles/SampleData (2).xlsx]
(ooxml, embedded labels, table is Input)


NoConcatenate
//Take MAX REC_NBR for KEY

[MAX_REC_NBR]:
LOAD
KEY,
max(REC_NBR) as MAX_REC_NBR

RESIDENT TEMP_DATA
Group By KEY;

Left Join (MAX_REC_NBR)

LOAD *,
REC_NBR as MAX_REC_NBR,
KEY as MAX_REC_KEY

RESIDENT TEMP_DATA;

DROP fields REC_NBR,KEY ;
DROP table TEMP_DATA;

exit script;