Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi
I have source data of which this Load Inline is a small sample :
SourceData :
LOAD * INLINE [
DimA, Date, DimB
18524, 01/02/2010, MAALC
18524, 13/02/2010, MAALC
18524, 25/02/2010, MAALC
18524, 26/02/2010, MAALC
18524, 02/07/2012, MABCN
18524, 02/07/2012, MABCN
18524, 27/02/2013, MABCN
18524, 28/02/2013, MABCN
18524, 01/05/2013, MAIBZ
18524, 02/05/2013, MAIBZ
18524, 02/11/2013, MAIBZ
18524, 03/11/2013, MAIBZ
18524, 04/11/2013, MABCN
18524, 05/11/2013, MABCN
18524, 22/02/2014, MABCN
18524, 23/02/2014, MABCN
12345, 01/02/2010, MAALC
12345, 13/02/2010, MAALC
12345, 25/02/2010, MAALC
12345, 26/02/2010, MAALC
12345, 02/07/2012, MAIBZ
12345, 02/07/2012, MAIBZ
12345, 27/02/2013, MAIBZ
12345, 28/02/2013, MAIBZ
12345, 01/05/2013, MAALC
12345, 02/05/2013, MAALC
12345, 02/11/2013, MAALC
12345, 03/11/2013, MAALC
12345, 04/11/2013, MAIBZ
12345, 05/11/2013, MAIBZ
12345, 22/02/2014, MAIBZ
12345, 23/02/2014, MAIBZ
];
From this I need to aggregate such that for each DimA value, it outputs a row for each time the DimB value changes that shows its StartDate and EndDate. The joker is that DimB may be repeated for a DimA for a second date range.
I am sure I am explaining it badly, so below is the output I am after.
DimA | DimB | StartDate | EndDate |
18524 | MAALC | 01/02/2010 | 26/02/2010 |
18524 | MABCN | 02/07/2012 | 28/02/2013 |
18524 | MAIBZ | 01/05/2013 | 03/11/2013 |
18524 | MABCN | 04/11/2013 | 23/02/2014 |
12345 | MAALC | 01/02/2010 | 26/02/2010 |
12345 | MAIBZ | 02/07/2012 | 28/02/2013 |
12345 | MAALC | 01/05/2013 | 02/11/2013 |
12345 | MAIBZ | 04/11/2013 | 23/02/2014 |
This has been driving me bananas all day and I have gone snow blind with sorting, grouping, peeking, min'ing, max'ing
So any suggestions would be very welcome.
Best Regards, Bill
Try this:
SourceData :
LOAD DimA, Date, DimB,
if(previous(DimB) = DimB, peek('Counter'),rangesum(1,peek('Counter'))) as Counter
INLINE [
DimA, Date, DimB
18524, 01/02/2010, MAALC
18524, 13/02/2010, MAALC
...etc
12345, 22/02/2014, MAIBZ
12345, 23/02/2014, MAIBZ
];
Result:
load DimA, DimB, Counter, min(Date) as StartDate, max(Date) as EndDate
Resident SourceData
group by DimA, DimB, Counter;
//optionally:
// Drop field Counter;
This does assume that the data is already ordered correctly. If it isn't you'll need an extra resident load to fix the order first.
Try this:
SourceData :
LOAD DimA, Date, DimB,
if(previous(DimB) = DimB, peek('Counter'),rangesum(1,peek('Counter'))) as Counter
INLINE [
DimA, Date, DimB
18524, 01/02/2010, MAALC
18524, 13/02/2010, MAALC
...etc
12345, 22/02/2014, MAIBZ
12345, 23/02/2014, MAIBZ
];
Result:
load DimA, DimB, Counter, min(Date) as StartDate, max(Date) as EndDate
Resident SourceData
group by DimA, DimB, Counter;
//optionally:
// Drop field Counter;
This does assume that the data is already ordered correctly. If it isn't you'll need an extra resident load to fix the order first.
Gysbert
Many thanks. Your script line
if(previous(DimB) = DimB, peek('Counter'),rangesum(1,peek('Counter'))) as Counter
is indeed cunning and I am pretty sure is what I needed.
I'll adapt your suggestion to my real data and advise, but I am optimistic.
Many Thanks, Bill
Gysbert
Yup, works perfectly.
Many, Many Thanks, Bill