Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Bucharest on Sept 18th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
RoyBatty
Contributor III
Contributor III

How to remove duplicates (rows that have the same id, keep the most recent) ("Distinct" does not work)?

Hi,

We have the following table that we want to save as a .qvd file:

[tickets]:
LOAD
[id] AS [tickets.id],
[created_at] AS [tickets.created_at],
[updated_at] AS [tickets.updated_at],
[fr_escalated] AS [tickets.fr_escalated],
[spam] AS [tickets.spam],
[email_config_id] AS [tickets.email_config_id],
[group_id] AS [tickets.group_id],
[priority] AS [tickets.priority],
[requester_id] AS [tickets.requester_id],
[responder_id] AS [tickets.responder_id],
[source] AS [tickets.source],
[company_id] AS [tickets.company_id],
[status] AS [tickets.status],
[subject] AS [tickets.subject],
[association_type] AS [tickets.association_type],
[support_email] AS [tickets.support_email],
[product_id] AS [tickets.product_id],
[type] AS [tickets.type],
[due_by] AS [tickets.due_by],
[fr_due_by] AS [tickets.fr_due_by],
[is_escalated] AS [tickets.is_escalated],
[associated_tickets_count]AS [tickets.associated_tickets_count],
[internal_agent_id] AS [tickets.internal_agent_id],
[internal_group_id] AS [tickets.internal_group_id],
[nr_due_by] AS [tickets.nr_due_by],
[nr_escalated] AS [tickets.nr_escalated],
[to_emails] AS [tickets.to_emails],
[form_id] AS [tickets.form_id]
RESIDENT RestConnectorMasterTable
WHERE NOT IsNull([__KEY_root]);

Store [tickets] into [lib://QVDFOLDER/Test/tickets.qvd] (qvd);

 

But before saving we want to remove duplicates: rows that have the same tickets.id (tickets.id should be unique). Also, the record with most recent tickets.updated_at should be kept.

We tried using "Distinct" - but that doesn't remove duplicates at all (because, as far as we understand, it applies to entire rows, not just a specific field like tickets.id).

Do you have any suggestions on how we can resolve this?

Labels (1)
1 Solution

Accepted Solutions
diegozecchini
Specialist
Specialist

Hi!
To achieve the goal of keeping only the most recent record for each tickets.id based on the tickets.updated_at field, you can use Qlik's aggregation capabilities.

Group by tickets.id and determine the maximum tickets.updated_at value for each group.
Use this maximum value to filter and keep only the most recent record for each tickets.id.

Below is an example script to accomplish this:


// Step 1: Load the data with an additional column for the maximum updated_at per tickets.id
TicketsWithMaxDate:
LOAD
[tickets.id],
[tickets.updated_at],
Max([tickets.updated_at]) AS MaxUpdatedAt
RESIDENT tickets
GROUP BY [tickets.id];

// Step 2: Join this information back with the original table to filter only the most recent rows
MostRecentTickets:
LOAD
*
RESIDENT tickets
WHERE Exists([tickets.id]) AND Exists([tickets.updated_at], MaxUpdatedAt);

// Step 3: Drop intermediate tables to clean up memory
DROP TABLE TicketsWithMaxDate;

// Step 4: Store the deduplicated data into a .qvd file
STORE MostRecentTickets INTO [lib://QVDFOLDER/Test/tickets.qvd] (qvd);

// Step 5: Drop temporary table
DROP TABLE MostRecentTickets;

View solution in original post

1 Reply
diegozecchini
Specialist
Specialist

Hi!
To achieve the goal of keeping only the most recent record for each tickets.id based on the tickets.updated_at field, you can use Qlik's aggregation capabilities.

Group by tickets.id and determine the maximum tickets.updated_at value for each group.
Use this maximum value to filter and keep only the most recent record for each tickets.id.

Below is an example script to accomplish this:


// Step 1: Load the data with an additional column for the maximum updated_at per tickets.id
TicketsWithMaxDate:
LOAD
[tickets.id],
[tickets.updated_at],
Max([tickets.updated_at]) AS MaxUpdatedAt
RESIDENT tickets
GROUP BY [tickets.id];

// Step 2: Join this information back with the original table to filter only the most recent rows
MostRecentTickets:
LOAD
*
RESIDENT tickets
WHERE Exists([tickets.id]) AND Exists([tickets.updated_at], MaxUpdatedAt);

// Step 3: Drop intermediate tables to clean up memory
DROP TABLE TicketsWithMaxDate;

// Step 4: Store the deduplicated data into a .qvd file
STORE MostRecentTickets INTO [lib://QVDFOLDER/Test/tickets.qvd] (qvd);

// Step 5: Drop temporary table
DROP TABLE MostRecentTickets;