Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi!
Thought about asking if anyone can think of a solution for my problem here, since I'm new to Qlik I'm not sure if there is a function that can do that job.
I have a dataset with many duplicated records, but every record has a different Id. I'd like to delete the duplicates and maintain just the first of the duplicated records.
In this example, rows 2-6 are duplicated.
Row_number | Id | Start Station | Departure Time | Final Station | Arrival Time | DepartureTime_Retur | ArrivalTime_Retur | Count Changes | score |
1 | bb945103 | City1 | 07:09:00 | City2 | 12:38:00 | 16:07:00 | 21:46:00 | 1 | -16860 |
2 | 3d63b7b4 | City1 | 07:09:00 | City2 | 12:38:00 | 21:07:00 | 06:45:00 | 2 | -48240 |
3 | 658a5c90 | City1 | 07:09:00 | City2 | 12:38:00 | 21:07:00 | 06:45:00 | 2 | -48240 |
4 | e12f4755 | City1 | 07:09:00 | City2 | 12:38:00 | 21:07:00 | 06:45:00 | 2 | -48240 |
5 | f0770ed2 | City1 | 07:09:00 | City2 | 12:38:00 | 21:07:00 | 06:45:00 | 2 | -48240 |
6 | a107b4e0 | City1 | 07:09:00 | City2 | 12:38:00 | 20:07:00 | 06:45:00 | 2 | -48240 |
7 | fc6fea72 | City1 | 07:09:00 | City2 | 12:38:00 | 21:07:00 | 05:42:00 | 2 | -44460 |
What I'd like to see instead is:
Row_number | Id | Start Station | Departure Time | Final Station | Arrival Time | DepartureTime_Retur | ArrivalTime_Retur | Count Changes | score |
1 | bb945103 | City1 | 07:09:00 | City2 | 12:38:00 | 16:07:00 | 21:46:00 | 1 | -16860 |
2 | 3d63b7b4 | City1 | 07:09:00 | City2 | 12:38:00 | 21:07:00 | 06:45:00 | 2 | -48240 |
7 | fc6fea72 | City1 | 07:09:00 | City2 | 12:38:00 | 21:07:00 | 05:42:00 | 2 | -44460 |
Is there a way to achieve this? I'm attaching a small sample of data as well if there is someone who woul'd like to try a solution here.
I tried Load Distinct, but it doesn't work as I have distinct ID for every row.
I tried this one first and it didn't work. Got the solution @stefanmereuta suggested. But thanks for the contribution.
Data quality is fine. The fact that there are different id's exists for a reason, which I'm not opening here because it does not matter which one will be deleted - for the results I'm looking for.
The idea was to try to find a solution to just delete them from the visualization tool, and let the ETL chain as it is in case we need to go back and need all the records as they are.
In the "Tablename" table, load data with aggregated functions like Min and FirstSortedValue. Group the data by specific fields such as start and final stations, departure and arrival times, and other relevant parameters. This aggregation process helps organize and summarize the data effectively. Ensure the source data is correctly specified to populate the table with the desired information