Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Record filtering problem

Hello,

I have a table with records ordered by ID and StartDate. Each record has also an EndDate. I need to delete records whose EndDate is prior to the EndDate of any previous record with the same ID. But I have to count these deleted records and save the result in the first one of the group. An example would be easier to understand, I hope :

This is the starting table (marked the records that should be deleted):

IDStartDateEndDate
101/10/200730/06/2008
123/10/200730/06/2011
120/05/200813/06/2008
101/10/200830/06/2009
101/07/200924/07/2009
101/10/200930/06/2010
101/10/201030/06/2012
125/05/201123/06/2011
210/02/200810/02/2008
225/03/200915/06/2013
201/01/201031/12/2010
210/11/201015/06/2013
210/01/201330/05/2014
220/06/201320/06/2013

And this should be the output:


IDStartDateEndDateNofRecords
101/10/200730/06/20081
123/10/200730/06/20115
101/10/201030/06/20122
210/02/200810/02/20081
225/03/200915/06/20133
210/01/201330/05/20142


I've solved it with an iteration approach, but it's slow (more than 600 iterations to clean up a bunch of 1.2 million records)

Any ideas?

Thank you.

5 Replies
Not applicable
Author

Hi Juan,

Your example isn't clear to me. With the logic said, there will be only Distinct ID's retained in final table right as its End Date will greater than all other dates with the same ID. Please provide some more info on this.

Thanks,

Prabhu

Not applicable
Author

Thanks for your interes Prabhu, and sorry if my example is not clear (it's hard for me to explain this in English).

As you can see in the first table, records are ordered by ID and StartDate but there are records with StartDate greater than the previous StartDate but with EndDate prior to the previous record's EndDate. These should be added to counter (NofRecords) and delete from the output table.

The records are periods of service and I need to summarize services given without a time gap between them in a single record.

I can't explain it better... Hope you understand now.

martinpedersen
Creator
Creator

Something like this perhaps:

Load ID, StartDate, EndDate from Dates.qvd(qvd) where EndDate>If(ID=Peek(ID), Peek(EndDate), 0);

This will load the desired lines (if sorted in IDs) but not make a count of NoOfRecords - so it still need some work...

anbu1984
Master III
Master III

Tab:

Load ID,StartDate,EndDate,RecNo() as RecNo where ID <> Peek(ID) Or (ID = Peek(ID) And EndDate > Peek(EndDate));

Load ID,Date#(StartDate,'D/M/YYYY') As StartDate,Date#(EndDate,'D/M/YYYY') As EndDate Inline [

ID,StartDate,EndDate

1,01/10/2007,30/06/2008

1,23/10/2007,30/06/2011

1,20/05/2008,13/06/2008

1,01/10/2008,30/06/2009

1,01/07/2009,24/07/2009

1,01/10/2009,30/06/2010

1,01/10/2010,30/06/2012

1,25/05/2011,23/06/2011

2,10/02/2008,10/02/2008

2,25/03/2009,15/06/2013

2,01/01/2010,31/12/2010

2,10/11/2010,15/06/2013

2,10/01/2013,30/05/2014

2,20/06/2013,20/06/2013

3,20/06/2013,20/06/2013 //Add an extra row

];

NoConcatenate

Final:

Load * Where ID <> 3; //Remove extra row

Load ID,StartDate,EndDate,If(IsNull(Previous(RecNo)),RecNo,Previous(RecNo) - RecNo ) Resident Tab Order by RecNo desc;

Drop table Tab;

Not applicable
Author

Thanks Martin and anbu,

This helps, but it's not exactly what I need cause there are more than one record to remove following a given valid record and the proposed 'WHERE' clause only removes the first one. I'm using your solution combined with a loop, but as my file has 1,2 million records, it takes very long to complete the proccess (>600 loops an 30 mins ).

Hope someone could give me a hint to solve it