Solved: Remove Duplicates That Aren't Actually Duplicates - Qlik Community

fietzn · ‎2022-12-19

Hi,

I'm trying to figure out how to remove duplicates that aren't truly duplicates. In the attached data set, I only want to return the row with the latest 'Action Date Time' for a given Login Name, Case ID, Alert ID and Action Date combination. I can't use the Load Distinct command because I need to keep the Action Text field, which is unique for each record.

I thought I could:

Create an index field based on Login Name, Case ID, Alert ID and Action Date
Keep only the min index value based on those combos and
Join that back to all the records to return the rest of the fields based on index

However, with both the FirstSortedValue and Min functions, Qlik is still returning all index values as shown in the attached image. Any help would be appreciated.

Thanks.

fietzn · ‎2023-01-05

I was able to find an alternate solution by generating record numbers, and then right joining to a table where I took only the max record number for each respective key. Below is the code.

TEMP_DATA:

LOAD distinct
    RecNo() as REC_NBR,
	"Case ID",
    "Alert ID",
    "Action Date Time",
    "Create Date*",
    "Login Name",
    "FI Name",
    "Create Date Time",
    "Action Text",
    "Action Date*",
    "Close or Pend*",
	
    //Create Key
    Date(Floor(Timestamp#("Action Date*", 'MM/DD/YY hh:mm')), 'YYYY/MM/DD') &'|'& "Alert ID" &'|'& "Case ID" &'|'& "Login Name" as KEY

FROM [lib://AttachedFiles/SampleData (2).xlsx]
(ooxml, embedded labels, table is Input)


NoConcatenate
//Take MAX REC_NBR for KEY

[MAX_REC_NBR]:
LOAD
KEY,
max(REC_NBR) as MAX_REC_NBR

RESIDENT TEMP_DATA
Group By KEY;

Left Join (MAX_REC_NBR)

LOAD *,
REC_NBR as MAX_REC_NBR,
KEY as MAX_REC_KEY

RESIDENT TEMP_DATA;

DROP fields REC_NBR,KEY ;
DROP table TEMP_DATA;

exit script;

View solution in original post

vinieme12 · ‎2022-12-19

using your sampledata

LOAD
    "Case ID",
    "Alert ID",
    "Action Date Time",
    "Create Date*",
    "Login Name",
    "FI Name",
    "Create Date Time",
    "Action Text",
    "Action Date*",
    "Close or Pend*",
    AutoNumber("Case ID"&"Alert ID"&"Action Text","Case ID"&"Alert ID") as Index
FROM [lib://AttachedFiles/SampleData (2).xlsx]
(ooxml, embedded labels, table is Input);


NoConcatenate
Load * Resident temp 
Where Index=1;
drop table temp;


exit Script;

Vineeth Pujari
If a post helps to resolve your issue, please accept it as a Solution.

fietzn · ‎2022-12-21

Thank you @vinieme12. I believe this gets me 90% there. When doing some quality checks on my larger data set, I noticed an example where I would have expected the index to increment but it didn't, as shown in the image below.

I thought it would be a matter of changing the AutoID argument of the AutoNumber function, but making the tweak shown below didn't fix it. I've been struggling to understand the AutoID argument, so any advice on this is appreciated.

Thanks.

AutoNumber("Action Date*"&LOGIN_NAME&ALERT_ID&ACTION_TEXT, caseId&ALERT_ID&ACTION_DATE) as "Index*"

vinieme12 · ‎2022-12-21

Remove actiondate from autoid param

AutoNumber("Action Date*"&LOGIN_NAME&ALERT_ID&ACTION_TEXT, caseId&ALERT_ID) as "Index*"

Vineeth Pujari
If a post helps to resolve your issue, please accept it as a Solution.

fietzn · ‎2022-12-22

The output is actually the same if I have the AutoId parameter as caseId&ALERT_ID or caseId&ALERT_ID&ACTION_DATE. Both index the 2 line items as a "1".

fietzn · ‎2023-01-05

I was able to find an alternate solution by generating record numbers, and then right joining to a table where I took only the max record number for each respective key. Below is the code.

TEMP_DATA:

LOAD distinct
    RecNo() as REC_NBR,
	"Case ID",
    "Alert ID",
    "Action Date Time",
    "Create Date*",
    "Login Name",
    "FI Name",
    "Create Date Time",
    "Action Text",
    "Action Date*",
    "Close or Pend*",
	
    //Create Key
    Date(Floor(Timestamp#("Action Date*", 'MM/DD/YY hh:mm')), 'YYYY/MM/DD') &'|'& "Alert ID" &'|'& "Case ID" &'|'& "Login Name" as KEY

FROM [lib://AttachedFiles/SampleData (2).xlsx]
(ooxml, embedded labels, table is Input)


NoConcatenate
//Take MAX REC_NBR for KEY

[MAX_REC_NBR]:
LOAD
KEY,
max(REC_NBR) as MAX_REC_NBR

RESIDENT TEMP_DATA
Group By KEY;

Left Join (MAX_REC_NBR)

LOAD *,
REC_NBR as MAX_REC_NBR,
KEY as MAX_REC_KEY

RESIDENT TEMP_DATA;

DROP fields REC_NBR,KEY ;
DROP table TEMP_DATA;

exit script;

Remove Duplicates That Aren't Actually Duplicates

Data Load Editor