Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I'm looking to remove duplicates in my load script however this is based on a number of columns.
So for example I have a table as below:
ID Status Month Day
1 Success Jan 01
1 Fail Jan 01
2 Partial Jan 01
3 Fail Jan 01
3 Success Jan 02
And I want to remove the duplicate if the status for ID is Success and Fail while also the month and Day are equal.
However all other data outside this condition would be unaffected.
So then my table would look like this:
ID Status Month Day
1 Success Jan 01
2 Partial Jan 01
3 Fail Jan 01
3 Success Jan 02
Does anyone know how to do this?
Any help is much appreciated!
Thanks in advance
HI McKay,
Try like below
Temp:
LOAD * INLINE [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];
Load * where Peek('RowNo')<>RowNo;
Load AutoNumber(ID&Month&Day) as RowNo, * Resident Temp
Order by ID, Month, Day;
Hope it helps
@mckay9999 One Option:
Data:
load *,ID&'_'&Month&'_'&Day as IDtmp inline [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];
output:
noconcatenate
load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where Status='Success';
concatenate
load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where not Exists(Idtmp2,IDtmp);
drop table Data;
drop fields IDtmp,Idtmp2;
output:
@mckay9999 or:
Data:
load *,ID&'_'&Month&'_'&Day as IDtmp inline [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];
output:
noconcatenate
load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where Match(Status,'Fail')=0;
concatenate
load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where not Exists(Idtmp2,IDtmp);
drop table Data;
drop fields IDtmp,Idtmp2;
HI McKay,
Try like below
Temp:
LOAD * INLINE [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];
Load * where Peek('RowNo')<>RowNo;
Load AutoNumber(ID&Month&Day) as RowNo, * Resident Temp
Order by ID, Month, Day;
Hope it helps
@mckay9999 One Option:
Data:
load *,ID&'_'&Month&'_'&Day as IDtmp inline [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];
output:
noconcatenate
load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where Status='Success';
concatenate
load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where not Exists(Idtmp2,IDtmp);
drop table Data;
drop fields IDtmp,Idtmp2;
output:
@mckay9999 or:
Data:
load *,ID&'_'&Month&'_'&Day as IDtmp inline [
ID,Status,Month,Day
1,Success,Jan,01
1,Fail,Jan,01
2,Partial,Jan,01
3,Fail,Jan,01
3,Success,Jan,02
];
output:
noconcatenate
load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where Match(Status,'Fail')=0;
concatenate
load *,ID&'_'&Month&'_'&Day as Idtmp2 resident Data where not Exists(Idtmp2,IDtmp);
drop table Data;
drop fields IDtmp,Idtmp2;
Hi guys,
Thanks for the quick response - much appreciated.
The data set that I am using has 100s of rows so will the INLINE function work with these potential solutions?
I had used those 5 rows as a small example
@mckay9999 you can change the Inline by your data and check the solution
HI McKay,
You can use any data source like excel, qvd.
Solution is just sample for your verification.
You need to take the logic from community & change accordingly based on your original data.