Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Friends
I have 2 columns - ID and Status.
For any ID's which are repeating and also having Status ='Completed' or 'Retained' , need to be removed.
Example: In attached Excel, ID =1 and ID=45 is satisfying above condition and should be removed.
I want to handle in scripting part.
Thanks in advance for your help.
Thanks & Regards
Chintan
ID29 has statuses 'Completed' and 'Cancelled' - that one should not be removed? In that case, the condition you're looking for should be:
For any ID's which are repeating and also having Status ='Completed' AND 'Retained'
?
EDIT:
If this is true you need to do it in couple of steps:
A:
LOAD
ID,
Status
FROM [lib://dt/Test_Logic.xlsx]
(ooxml, embedded labels, table is Sheet1)
;
Left Join (A)
LOAD
ID,
count(distinct(Status)) as idc
Resident A
Group by ID
;
Now you identified those ID's which have more than 1 status. If you need to remove those which have 'Completed' AND 'Retained' but keep with status 'Cancelled' you need to do this:
B:
Load
ID,
Status
Resident A
Where idc=1;
First you load those which have only 1 ID.
Outer join (B)
Load
ID,
Status
Resident A
Where Status='Cancelled' and idc>1;
Drop Table A;
Now you join those with two ID's but one of those has status 'Cancelled' (note: it will join with Status cancelled and only one line). If you need to remove all with more than one status you don't need the second part, you can do it with inner join and where idc=1.
ID29 has statuses 'Completed' and 'Cancelled' - that one should not be removed? In that case, the condition you're looking for should be:
For any ID's which are repeating and also having Status ='Completed' AND 'Retained'
?
EDIT:
If this is true you need to do it in couple of steps:
A:
LOAD
ID,
Status
FROM [lib://dt/Test_Logic.xlsx]
(ooxml, embedded labels, table is Sheet1)
;
Left Join (A)
LOAD
ID,
count(distinct(Status)) as idc
Resident A
Group by ID
;
Now you identified those ID's which have more than 1 status. If you need to remove those which have 'Completed' AND 'Retained' but keep with status 'Cancelled' you need to do this:
B:
Load
ID,
Status
Resident A
Where idc=1;
First you load those which have only 1 ID.
Outer join (B)
Load
ID,
Status
Resident A
Where Status='Cancelled' and idc>1;
Drop Table A;
Now you join those with two ID's but one of those has status 'Cancelled' (note: it will join with Status cancelled and only one line). If you need to remove all with more than one status you don't need the second part, you can do it with inner join and where idc=1.
maybe someting as below:
tab:
LOAD
*
where ret > 1;
LOAD
ID,
count(Status) as ret
from ............
where ( Status = 'Completed' or Status = 'Retained' )
group by ID;
I hope it can helps.
Hello David
Thanks for replying.
Need to remove only the ones which are Cancelled and Retained.
Thanks & Regards
Chintan Gala
Thanks David, its working fine.