Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
mckay9999
Contributor
Contributor

Removing duplicates on Load Script

Hi,

I'm looking to remove duplicates in my load script however this is based on a number of columns. 
So for example I have a table as below:

ID    Status   Month Day
1      Success    Jan     01
1      Fail             Jan     01
2      Partial       Jan     01
3      Fail             Jan     01
3      Success    Jan     02

And I want to remove the duplicate if the status for ID is Success and Fail while also the month and Day are equal.
However all other data outside this condition would be unaffected.
So then my table would look like this:

ID    Status   Month Day
1      Success    Jan     01
2      Partial       Jan     01
3      Fail             Jan     01
3      Success    Jan     02

Does anyone know how to do this?

Any help is much appreciated!
Thanks in advance

Labels (5)
3 Solutions

Accepted Solutions
MayilVahanan

HI McKay,

Try like below

Temp:
LOAD * INLINE [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];

Load * where Peek('RowNo')<>RowNo;
Load AutoNumber(ID&Month&Day) as RowNo, * Resident Temp
Order by ID, Month, Day;

Hope it helps

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.

View solution in original post

Taoufiq_Zarra

@mckay9999  One Option:

Data:
load *,ID&'_'&Month&'_'&Day as IDtmp inline [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];

output:
noconcatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where Status='Success';

concatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where not Exists(Idtmp2,IDtmp);

drop table Data;
drop fields IDtmp,Idtmp2;

 

output:

Capture.PNG

Regards,
Taoufiq ZARRA

"Please LIKE posts and "Accept as Solution" if the provided solution is helpful "

(you can mark up to 3 "solutions") 😉

View solution in original post

Taoufiq_Zarra

@mckay9999  or:

Data:
load *,ID&'_'&Month&'_'&Day as IDtmp inline [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];

output:
noconcatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where Match(Status,'Fail')=0;

concatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where not Exists(Idtmp2,IDtmp);

drop table Data;
drop fields IDtmp,Idtmp2;
Regards,
Taoufiq ZARRA

"Please LIKE posts and "Accept as Solution" if the provided solution is helpful "

(you can mark up to 3 "solutions") 😉

View solution in original post

6 Replies
MayilVahanan

HI McKay,

Try like below

Temp:
LOAD * INLINE [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];

Load * where Peek('RowNo')<>RowNo;
Load AutoNumber(ID&Month&Day) as RowNo, * Resident Temp
Order by ID, Month, Day;

Hope it helps

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.
Taoufiq_Zarra

@mckay9999  One Option:

Data:
load *,ID&'_'&Month&'_'&Day as IDtmp inline [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];

output:
noconcatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where Status='Success';

concatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where not Exists(Idtmp2,IDtmp);

drop table Data;
drop fields IDtmp,Idtmp2;

 

output:

Capture.PNG

Regards,
Taoufiq ZARRA

"Please LIKE posts and "Accept as Solution" if the provided solution is helpful "

(you can mark up to 3 "solutions") 😉
Taoufiq_Zarra

@mckay9999  or:

Data:
load *,ID&'_'&Month&'_'&Day as IDtmp inline [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];

output:
noconcatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where Match(Status,'Fail')=0;

concatenate

load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where not Exists(Idtmp2,IDtmp);

drop table Data;
drop fields IDtmp,Idtmp2;
Regards,
Taoufiq ZARRA

"Please LIKE posts and "Accept as Solution" if the provided solution is helpful "

(you can mark up to 3 "solutions") 😉
mckay9999
Contributor
Contributor
Author

Hi guys,

Thanks for the quick response - much appreciated.

The data set that I am using has 100s of rows so will the INLINE function work with these potential solutions?
I had used those 5 rows as a small example

Taoufiq_Zarra

@mckay9999  you can change the Inline by your data and check the solution

Regards,
Taoufiq ZARRA

"Please LIKE posts and "Accept as Solution" if the provided solution is helpful "

(you can mark up to 3 "solutions") 😉
MayilVahanan

HI McKay,

You can use any data source like excel, qvd. 

Solution is just sample for your verification.

You need to take the logic from community & change accordingly based on your original data.

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.