5 Replies Latest reply: May 15, 2014 3:52 AM by Juan Antonio Amunategui RSS

    Record filtering problem

    Juan Antonio Amunategui

      Hello,

       

      I have a table with records ordered by ID and StartDate. Each record has also an EndDate. I need to delete records whose EndDate is prior to the EndDate of any previous record with the same ID. But I have to count these deleted records and save the result in the first one of the group. An example would be easier to understand, I hope :

       

      This is the starting table (marked the records that should be deleted):

       

      IDStartDateEndDate
      101/10/200730/06/2008
      123/10/200730/06/2011
      120/05/200813/06/2008
      101/10/200830/06/2009
      101/07/200924/07/2009
      101/10/200930/06/2010
      101/10/201030/06/2012
      125/05/201123/06/2011
      210/02/200810/02/2008
      225/03/200915/06/2013
      201/01/201031/12/2010
      210/11/201015/06/2013
      210/01/201330/05/2014
      220/06/201320/06/2013

       

      And this should be the output:


      IDStartDateEndDateNofRecords
      101/10/200730/06/20081
      123/10/200730/06/20115
      101/10/201030/06/20122
      210/02/200810/02/20081
      225/03/200915/06/20133
      210/01/201330/05/20142


      I've solved it with an iteration approach, but it's slow (more than 600 iterations to clean up a bunch of 1.2 million records)

       

      Any ideas?

       

      Thank you.

        • Re: Record filtering problem

          Hi Juan,

           

          Your example isn't clear to me. With the logic said, there will be only Distinct ID's retained in final table right as its End Date will greater than all other dates with the same ID. Please provide some more info on this.

           

          Thanks,

          Prabhu

          • Re: Record filtering problem
            Juan Antonio Amunategui

            Thanks for your interes Prabhu, and sorry if my example is not clear (it's hard for me to explain this in English).

             

            As you can see in the first table, records are ordered by ID and StartDate but there are records with StartDate greater than the previous StartDate but with EndDate prior to the previous record's EndDate. These should be added to counter (NofRecords) and delete from the output table.

             

            The records are periods of service and I need to summarize services given without a time gap between them in a single record.

             

            I can't explain it better... Hope you understand now.

            • Re: Record filtering problem
              Martin Pedersen

              Something like this perhaps:

               

              Load ID, StartDate, EndDate from Dates.qvd(qvd) where EndDate>If(ID=Peek(ID), Peek(EndDate), 0);

               

              This will load the desired lines (if sorted in IDs) but not make a count of NoOfRecords - so it still need some work...

                • Re: Record filtering problem
                  Juan Antonio Amunategui

                  Thanks Martin and anbu,

                   

                  This helps, but it's not exactly what I need cause there are more than one record to remove following a given valid record and the proposed 'WHERE' clause only removes the first one. I'm using your solution combined with a loop, but as my file has 1,2 million records, it takes very long to complete the proccess (>600 loops an 30 mins ).

                   

                  Hope someone could give me a hint to solve it

                • Re: Record filtering problem
                  anbu cheliyan

                  Tab:

                  Load ID,StartDate,EndDate,RecNo() as RecNo where ID <> Peek(ID) Or (ID = Peek(ID) And EndDate > Peek(EndDate));

                  Load ID,Date#(StartDate,'D/M/YYYY') As StartDate,Date#(EndDate,'D/M/YYYY') As EndDate Inline [

                  ID,StartDate,EndDate

                  1,01/10/2007,30/06/2008

                  1,23/10/2007,30/06/2011

                  1,20/05/2008,13/06/2008

                  1,01/10/2008,30/06/2009

                  1,01/07/2009,24/07/2009

                  1,01/10/2009,30/06/2010

                  1,01/10/2010,30/06/2012

                  1,25/05/2011,23/06/2011

                  2,10/02/2008,10/02/2008

                  2,25/03/2009,15/06/2013

                  2,01/01/2010,31/12/2010

                  2,10/11/2010,15/06/2013

                  2,10/01/2013,30/05/2014

                  2,20/06/2013,20/06/2013

                  3,20/06/2013,20/06/2013 //Add an extra row

                  ];

                   

                   

                  NoConcatenate

                  Final:

                  Load * Where ID <> 3; //Remove extra row

                  Load ID,StartDate,EndDate,If(IsNull(Previous(RecNo)),RecNo,Previous(RecNo) - RecNo ) Resident Tab Order by RecNo desc;

                   

                   

                  Drop table Tab;