Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
I have below dataset.
Load * Inline [
ID,Category,Date,Flag
11a1,a,01/01/2024,1
2a11,a,10/01/2024,1
3b23,a,15/01/2024,1
4a5b,a,16/01/2024,1
4j5b,a,17/01/2024,0
1aa1,b,01/01/2024,1
2ab1,b,02/01/2024,1
3bbc,b,03/01/2024,0
11a1,c,01/01/2024,1
2211,c,10/01/2024,1
3bd3,c,15/01/2024,1
aa5b,c,16/01/2024,1
ah5b,c,17/01/2024,1
1ca1,d,01/01/2024,1
2b11,d,10/01/2024,1
3g23,d,15/01/2024,1
4b5b,d,16/01/2024,1
110c,e,14/12/2023,0
1ca1,e,01/01/2024,1
2b11,e,10/01/2024,1
33ba,e,11,01/2024,0
3g23,e,15/01/2024,1
4b5b,e,16/01/2024,1 ]
I want to keep only highlighted records from above table where Flag =1 is consecutively repeating only 4 times, it might have 0 afterwards which should be fine. If Flag=1 is repeated less than 4 or greater than 4 then discard those category records.
@rob_vander try below
Data:
Load * Inline [
ID,Category,Date,Flag
11a1,a,01/01/2024,1
2a11,a,10/01/2024,1
3b23,a,15/01/2024,1
4a5b,a,16/01/2024,1
4j5b,a,16/01/2024,0
1aa1,b,01/01/2024,1
2ab1,b,02/01/2024,1
3bbc,b,03/01/2024,0
11a1,c,01/01/2024,1
2211,c,10/01/2024,1
3bd3,c,15/01/2024,1
aa5b,c,16/01/2024,1
ah5b,c,17/01/2024,1
110c,d,14/12/2023,0
1ca1,d,01/01/2024,1
2b11,d,10/01/2024,1
3g23,d,15/01/2024,1
4b5b,d,16/01/2024,1
110c,e,14/12/2023,0
1ca1,e,01/01/2024,1
2b11,e,10/01/2024,1
33ba,e,11/01/2024,0
3g23,e,15/01/2024,1
4b5b,e,16/01/2024,1
a10c,f,14/12/2023,1
1aa1,f,01/01/2024,1
2v11,f,10/01/2024,1
3sba,f,11/01/2024,0
3a23,f,15/01/2024,1
4b4b,f,16/01/2024,1
3g13,f,17/01/2024,1
4b1b,f,18/01/2024,1];
New:
Load *,
if(Category<>Previous(Category) and Flag=1 or (Category=Previous(Category) and Previous(Flag)=0 and Flag=1),1,
if(Category=Previous(Category) and (Flag=1 and Previous(Flag)=1), rangesum(Peek('sequence'),Flag))) as sequence
Resident Data
Order by Category,Date;
Drop Table Data;
Left Join(New)
Load Category,
max(sequence) as max_sequence
Resident New
Group by Category;
Final:
NoConcatenate
Load *
Resident New
where max_sequence=4;
drop Table New;
@rob_vander what if we have below data, Last 4 records repeated 4 times, do you still want all the records for that category?
110c,e,14/12/2023,0
1ca1,e,01/01/2024,1
2b11,e,10/01/2024,1
33ba,e,11,01/2024,0
3g23,e,15/01/2024,1
4b5b,e,16/01/2024,1
ff22,e,17/01/2024,1
22ds,3,18/01/2024,1
This is an interesting challenge. Here is a script I quickly wrote that might help you:
@Kushal_Chawda In this case we need to keep all the records for that category. There won't be any scenario where Flag is repeating 4 times and more than 4 time at same time for that category
Hi @rob_vander , here another option, creating 2 filters fields :
@igoralcantara It is not working. Giving 0 records at end of the script
@QFabian It's not giving expected output.
@rob_vander try below
Data:
Load * Inline [
ID,Category,Date,Flag
11a1,a,01/01/2024,1
2a11,a,10/01/2024,1
3b23,a,15/01/2024,1
4a5b,a,16/01/2024,1
4j5b,a,16/01/2024,0
1aa1,b,01/01/2024,1
2ab1,b,02/01/2024,1
3bbc,b,03/01/2024,0
11a1,c,01/01/2024,1
2211,c,10/01/2024,1
3bd3,c,15/01/2024,1
aa5b,c,16/01/2024,1
ah5b,c,17/01/2024,1
110c,d,14/12/2023,0
1ca1,d,01/01/2024,1
2b11,d,10/01/2024,1
3g23,d,15/01/2024,1
4b5b,d,16/01/2024,1
110c,e,14/12/2023,0
1ca1,e,01/01/2024,1
2b11,e,10/01/2024,1
33ba,e,11/01/2024,0
3g23,e,15/01/2024,1
4b5b,e,16/01/2024,1
a10c,f,14/12/2023,1
1aa1,f,01/01/2024,1
2v11,f,10/01/2024,1
3sba,f,11/01/2024,0
3a23,f,15/01/2024,1
4b4b,f,16/01/2024,1
3g13,f,17/01/2024,1
4b1b,f,18/01/2024,1];
New:
Load *,
if(Category<>Previous(Category) and Flag=1 or (Category=Previous(Category) and Previous(Flag)=0 and Flag=1),1,
if(Category=Previous(Category) and (Flag=1 and Previous(Flag)=1), rangesum(Peek('sequence'),Flag))) as sequence
Resident Data
Order by Category,Date;
Drop Table Data;
Left Join(New)
Load Category,
max(sequence) as max_sequence
Resident New
Group by Category;
Final:
NoConcatenate
Load *
Resident New
where max_sequence=4;
drop Table New;