Solved: Re: Deleting duplicate data - Qlik Community

Jérôme5625 · ‎2024-03-26

Hello, I would like some advice on how to modify a database from QLIK.

Indeed, some records are made in duplicate which creates inconsistencies in my sheets in my applications.

Is there a simple way (function?) to delete in the script the lines of my database in duplicate on a field. .

In my case, the field that should not be duplicated is the "ID_repair" field;

if "ID_repair" is duplicated, delete the records after the date of the first declaration.

attached file with the current data (ARUAP2_temp) and the desired result (ARUAP2_final)

Thank you in advance and have a nice day.

Jérôme

Brian_C · ‎2024-03-26

You could try adding a resident load and amending the record order to ensure they are in ID_repair order.

ARUAP2_temp:
LOAD *
FROM <SourceTab;

ARUAP2_final

LOAD *,

If(ID_repair = previous(ID_repair),'Y','N')as Duplicate_Flag

resident ARUAP2_temp

order by ID_repair;

drop table ARUAP2_temp;

Then you could use the duplicate flag in set analysis to exclude duplicates or simply load where the flag is set to N.

View solution in original post

BrunPierre · ‎2024-03-26

One approach would be as ff

ARUAP2_temp:
LOAD *
FROM <SourceTable>
Where ID_repair <> Previous(ID_repair);

Jérôme5625 · ‎2024-03-26

thanks for your feedback but the result is wrong.

Do you have another idea?

thanks for advance.

regards.

Fabiano_Martino_Intelco · ‎2024-03-26

Hi @Jérôme5625

I tested the solution proposed by @BrunPierre and got the expected result.

Since you got a wrong result could you please give more information on what you got?

Regards

Fabiano

Jérôme5625 · ‎2024-03-26

I tested again with my complete database and the result is not well.

if i test with my last excel file that i have sent you, it is ok with the expression of @BrunPierre .

The problem is that my database is not sorted on the ID_repair field.In case the duplicate records do not follow each other, the result is not good.
What is the expression to sort my database on this field?

thanks

BrunPierre · ‎2024-03-26

Order by only works with resident Load.

LOAD * Resident ARUAP2_temp
Where ID_repair <> Previous(ID_repair)
Order By ID_repair, Date;

Alternative approach

ARUAP2_final:
LOAD * FROM ...;

Inner Join

LOAD ID_repair,
Min(Date) as Date

Resident ARUAP2_temp
Group By ID_repair;

Fabiano_Martino_Intelco · ‎2024-03-26

Indeed, @BrunPierre solution depends on the Source Table being ordered.

To sort the values you can use the Sort By clause, followed by the list of columns you want to order by.

ARUAP2_temp:
LOAD *
FROM <SourceTable>
Where ID_repair <> Previous(ID_repair)

Order By ID_Repair, Date;

Remember that the Order By clause can be used only on resident tables (see online help https://help.qlik.com/en-US/cloud-services/Subsystems/Hub/Content/Sense_Hub/Scripting/ScriptRegularS...

so make sure <SourceTable> is a resident table, loaded with data from your database.

Regards

Fabiano

Brian_C · ‎2024-03-26

You could try adding a resident load and amending the record order to ensure they are in ID_repair order.

ARUAP2_temp:
LOAD *
FROM <SourceTab;

ARUAP2_final

LOAD *,

If(ID_repair = previous(ID_repair),'Y','N')as Duplicate_Flag

resident ARUAP2_temp

order by ID_repair;

drop table ARUAP2_temp;

Then you could use the duplicate flag in set analysis to exclude duplicates or simply load where the flag is set to N.

Jérôme5625 · ‎2024-03-27

Hello all,

Thanks for your help.

The solution of Brian works correclty.

Have a nice day.

Jérôme

Deleting duplicate data

Qlik Sense

Script