3 Replies Latest reply: Feb 24, 2014 1:19 PM by Bill Markham RSS

    Aggregate to Start & End Dates

    Bill Markham

      Hi

       

      I have source data of which this Load Inline is a small sample :

      SourceData :
      LOAD * INLINE [
          DimA, Date, DimB
          18524, 01/02/2010, MAALC
          18524, 13/02/2010, MAALC
          18524, 25/02/2010, MAALC
          18524, 26/02/2010, MAALC
          18524, 02/07/2012, MABCN
          18524, 02/07/2012, MABCN
          18524, 27/02/2013, MABCN
          18524, 28/02/2013, MABCN
          18524, 01/05/2013, MAIBZ
          18524, 02/05/2013, MAIBZ
          18524, 02/11/2013, MAIBZ
          18524, 03/11/2013, MAIBZ
          18524, 04/11/2013, MABCN
          18524, 05/11/2013, MABCN
          18524, 22/02/2014, MABCN
          18524, 23/02/2014, MABCN
          12345, 01/02/2010, MAALC
          12345, 13/02/2010, MAALC
          12345, 25/02/2010, MAALC
          12345, 26/02/2010, MAALC
          12345, 02/07/2012, MAIBZ
          12345, 02/07/2012, MAIBZ
          12345, 27/02/2013, MAIBZ
          12345, 28/02/2013, MAIBZ
          12345, 01/05/2013, MAALC
          12345, 02/05/2013, MAALC
          12345, 02/11/2013, MAALC
          12345, 03/11/2013, MAALC
          12345, 04/11/2013, MAIBZ
          12345, 05/11/2013, MAIBZ
          12345, 22/02/2014, MAIBZ
          12345, 23/02/2014, MAIBZ
      ];

       

      From this I need to aggregate such that for each DimA value, it outputs a row for each time the DimB value changes that shows its StartDate and EndDate.  The joker is that DimB may be repeated for a DimA for a second date range.

       

      I am sure I am explaining it badly, so below is the output I am after.

       

      DimADimBStartDateEndDate
      18524MAALC01/02/201026/02/2010
      18524MABCN02/07/201228/02/2013
      18524MAIBZ01/05/201303/11/2013
      18524MABCN04/11/201323/02/2014
      12345MAALC01/02/201026/02/2010
      12345MAIBZ02/07/201228/02/2013
      12345MAALC01/05/201302/11/2013
      12345MAIBZ04/11/201323/02/2014

       

      This has been driving me bananas all day and I have gone snow blind with sorting, grouping, peeking, min'ing, max'ing

       

      So any suggestions would be very welcome.

       

       

       

      Best Regards,     Bill

        • Re: Aggregate to Start & End Dates
          Gysbert Wassenaar

          Try this:

           

          SourceData :
          LOAD DimA, Date, DimB,
          if(previous(DimB) = DimB, peek('Counter'),rangesum(1,peek('Counter'))) as Counter
          INLINE [
              DimA, Date, DimB
              18524, 01/02/2010, MAALC
              18524, 13/02/2010, MAALC
          ...etc
              12345, 22/02/2014, MAIBZ
              12345, 23/02/2014, MAIBZ
          ];
          
          Result:
          load DimA, DimB, Counter, min(Date) as StartDate, max(Date) as EndDate
          Resident SourceData
          group by DimA, DimB, Counter;
          
          //optionally:
          // Drop field Counter;
          
          

           

          This does assume that the data is already ordered correctly. If it isn't you'll need an extra resident load to fix the order first.