Solved: Re: Removing duplicates on Load Script - Qlik Community

mckay9999 · ‎2020-10-12

Hi,

I'm looking to remove duplicates in my load script however this is based on a number of columns.
So for example I have a table as below:

ID    Status Month Day
1    Success    Jan 01
1    Fail Jan 01
2    Partial Jan 01
3    Fail Jan 01
3    Success    Jan 02

And I want to remove the duplicate if the status for ID is Success and Fail while also the month and Day are equal.
However all other data outside this condition would be unaffected.
So then my table would look like this:

ID    Status Month Day
1    Success    Jan 01
2    Partial Jan 01
3    Fail Jan 01
3    Success    Jan 02

Does anyone know how to do this?

Any help is much appreciated!
Thanks in advance

MayilVahanan · ‎2020-10-12

HI McKay,

Try like below

Temp:
LOAD * INLINE [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];

Load * where Peek('RowNo')<>RowNo;
Load AutoNumber(ID&Month&Day) as RowNo, * Resident Temp
Order by ID, Month, Day;

Hope it helps

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.

View solution in original post

Taoufiq_Zarra · ‎2020-10-12

@mckay9999 One Option:

Data:
load *,ID&'_'&Month&'_'&Day as IDtmp inline [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];

output:
noconcatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where Status='Success';

concatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where not Exists(Idtmp2,IDtmp);

drop table Data;
drop fields IDtmp,Idtmp2;

output:

Regards,
Taoufiq ZARRA

"Please LIKE posts and "Accept as Solution" if the provided solution is helpful "

(you can mark up to 3 "solutions") 😉

View solution in original post

Taoufiq_Zarra · ‎2020-10-12

@mckay9999 or:

Data:
load *,ID&'_'&Month&'_'&Day as IDtmp inline [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];

output:
noconcatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where Match(Status,'Fail')=0;

concatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where not Exists(Idtmp2,IDtmp);

drop table Data;
drop fields IDtmp,Idtmp2;

Regards,
Taoufiq ZARRA

"Please LIKE posts and "Accept as Solution" if the provided solution is helpful "

(you can mark up to 3 "solutions") 😉

View solution in original post

MayilVahanan · ‎2020-10-12

HI McKay,

Try like below

Temp:
LOAD * INLINE [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];

Load * where Peek('RowNo')<>RowNo;
Load AutoNumber(ID&Month&Day) as RowNo, * Resident Temp
Order by ID, Month, Day;

Hope it helps

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.

Taoufiq_Zarra · ‎2020-10-12

@mckay9999 One Option:

Data:
load *,ID&'_'&Month&'_'&Day as IDtmp inline [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];

output:
noconcatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where Status='Success';

concatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where not Exists(Idtmp2,IDtmp);

drop table Data;
drop fields IDtmp,Idtmp2;

output:

Regards,
Taoufiq ZARRA

"Please LIKE posts and "Accept as Solution" if the provided solution is helpful "

(you can mark up to 3 "solutions") 😉

Taoufiq_Zarra · ‎2020-10-12

@mckay9999 or:

Data:
load *,ID&'_'&Month&'_'&Day as IDtmp inline [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];

output:
noconcatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where Match(Status,'Fail')=0;

concatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where not Exists(Idtmp2,IDtmp);

drop table Data;
drop fields IDtmp,Idtmp2;

Regards,
Taoufiq ZARRA

"Please LIKE posts and "Accept as Solution" if the provided solution is helpful "

(you can mark up to 3 "solutions") 😉

mckay9999 · ‎2020-10-12

Hi guys,

Thanks for the quick response - much appreciated.

The data set that I am using has 100s of rows so will the INLINE function work with these potential solutions?
I had used those 5 rows as a small example

Taoufiq_Zarra · ‎2020-10-12

@mckay9999 you can change the Inline by your data and check the solution

Regards,
Taoufiq ZARRA

"Please LIKE posts and "Accept as Solution" if the provided solution is helpful "

(you can mark up to 3 "solutions") 😉

MayilVahanan · ‎2020-10-12

HI McKay,

You can use any data source like excel, qvd.

Solution is just sample for your verification.

You need to take the logic from community & change accordingly based on your original data.

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.

Removing duplicates on Load Script

General Question

Script

Scripting