Skip to main content
Announcements
Qlik Introduces a New Era of Visualization! READ ALL ABOUT IT
cancel
Showing results for 
Search instead for 
Did you mean: 
Jérôme5625
Creator
Creator

Deleting duplicate data

Hello, I would like some advice on how to modify a database from QLIK.

Indeed, some records are made in duplicate which creates inconsistencies in my sheets in my applications.

Is there a simple way (function?) to delete in the script the lines of my database in duplicate on a field. .

In my case, the field that should not be duplicated is the "ID_repair" field;

if "ID_repair" is duplicated, delete the records after the date of the first declaration.

attached file with the current data (ARUAP2_temp) and the desired result (ARUAP2_final)

Thank you in advance and have a nice day.

Jérôme

Labels (2)
1 Solution

Accepted Solutions
Brian_C
Contributor III
Contributor III

You could try adding a resident load and amending the record order to ensure they are in ID_repair order.

ARUAP2_temp:
LOAD *
FROM <SourceTab;

ARUAP2_final

LOAD *,

If(ID_repair = previous(ID_repair),'Y','N')as Duplicate_Flag

resident ARUAP2_temp

order by ID_repair;

drop table ARUAP2_temp;

Then you could use the duplicate flag in set analysis to exclude duplicates or simply load where the flag is set to N.

 

View solution in original post

8 Replies
BrunPierre
Partner - Master
Partner - Master

One approach would be as ff

ARUAP2_temp:
LOAD *
FROM <SourceTable>
Where ID_repair <> Previous(ID_repair);

Jérôme5625
Creator
Creator
Author

thanks for your feedback but the result is wrong.

Do you have another idea? 

thanks for advance.

regards.

Fabiano_Martino_Intelco
Partner - Creator II
Partner - Creator II

Hi @Jérôme5625 

I tested the solution proposed by @BrunPierre and got the expected result.

Since you got a wrong result could you please give more information on what you got?

Regards

Fabiano

Jérôme5625
Creator
Creator
Author

I tested again with my complete database and the result is not well. 

if i test with my last excel file that i have sent you, it is ok with the expression of @BrunPierre .

The problem is that my database is not sorted on the ID_repair field.In case the duplicate records do not follow each other, the result is not good.
What is the expression to sort my database on this field?

thanks

BrunPierre
Partner - Master
Partner - Master

Order by only works with resident Load.

LOAD * Resident ARUAP2_temp
Where ID_repair <> Previous(ID_repair)
Order By ID_repair, Date;

Alternative approach

ARUAP2_final:
LOAD * FROM  ...;

Inner Join

LOAD ID_repair,
Min(Date) as Date

Resident ARUAP2_temp
Group By ID_repair;

Fabiano_Martino_Intelco
Partner - Creator II
Partner - Creator II

Indeed, @BrunPierre solution depends on the Source Table being ordered.

To sort the values you can use the Sort By clause, followed by the list of columns you want to order by.

ARUAP2_temp:
LOAD *
FROM <SourceTable>
Where ID_repair <> Previous(ID_repair)

Order By ID_Repair, Date;

 

Remember that the Order By clause can be used only on resident tables (see online help https://help.qlik.com/en-US/cloud-services/Subsystems/Hub/Content/Sense_Hub/Scripting/ScriptRegularS...

so make sure <SourceTable> is a resident table, loaded with data from your database.

Regards

Fabiano

Brian_C
Contributor III
Contributor III

You could try adding a resident load and amending the record order to ensure they are in ID_repair order.

ARUAP2_temp:
LOAD *
FROM <SourceTab;

ARUAP2_final

LOAD *,

If(ID_repair = previous(ID_repair),'Y','N')as Duplicate_Flag

resident ARUAP2_temp

order by ID_repair;

drop table ARUAP2_temp;

Then you could use the duplicate flag in set analysis to exclude duplicates or simply load where the flag is set to N.

 

Jérôme5625
Creator
Creator
Author

Hello all, 

Thanks for your help.

The solution of Brian works correclty.

Have a nice day.

Jérôme