Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I am trying to use set analysis and range sum to compute the total number of records that were active at a some point in the month
For example, if these are the records:
id state created_date modified_date
1 expired 01/12/2014 10/12/2014
2 expired 01/12/2014 10/03/2015
3 active 01/12/2014 01/12/2014
4 expired 10/01/2015 12/01/2015
5 expired 10/01/2015 11/03/2015
6 active 10/02/2015 10/02/2015
7 expired 10/03/2015 11/03/2015
The expected o/p is
Dec-14 3
Jan-15 4
Feb-15 4
Mar-15 5
E.g: For Jan-2015 the result should be 4 -
1 active record from Dec 2014 + 2 created in Jan + 1 from Dec which actually expired in March 2015
The modified_date is updated when a record is expired
I tried this:
rangesum(above(Count({$<[state] = {'active'} >} id), 0,12))
+ Count({1<[state] ={'expired'}, modified_date.Calendar.Month ={">=$(created_date.Calendar.Month)"}>} id)
I couldn't figure out how to do the second count statement. How can I aggregate the records which are currently not active but were active at some point during the month.Is it possible to achieve this without using separate calendar varables ?
Thanks in advance!
You can do it like HIC demonstrated in
Creating Reference Dates for Intervals
Then you just need to add a master calendar:
INPUT:
LOAD *,
if(state = 'active', today(), modified_date) as end_date
INLINE [
id, state, created_date, modified_date
1, expired, 01/12/2014, 10/12/2014
2, expired, 01/12/2014, 10/03/2015
3, active, 01/12/2014 ,01/12/2014
4, expired, 10/01/2015, 12/01/2015
5, expired, 10/01/2015, 11/03/2015
6, active, 10/02/2015, 10/02/2015
7, expired, 10/03/2015, 11/03/2015
];
id_x_Dates:
Load id,
Date( created_date + IterNo() -1 ) as ReferenceDate
Resident INPUT
While IterNo() <= end_date - created_date + 1 ;
Load Date,Date as ReferenceDate, Month(Date) as Month, Monthname(Date) as YearMonth ;
Load Date(MinDate+iterno()) as Date While iterno() <= MaxDate - MinDate ;
Load Min(ReferenceDate)-1 as MinDate, Max(ReferenceDate) as MaxDate resident id_x_Dates;
Then create a straight table with dimension YearMonth and expression
=Count(Distinct id)
YearMonth | Count(Distinct id) |
---|---|
7 | |
Dez 2014 | 3 |
Jan 2015 | 4 |
Feb 2015 | 4 |
Mrz 2015 | 5 |
Apr 2015 | 2 |
Mai 2015 | 2 |
Jun 2015 | 2 |
Jul 2015 | 2 |
Aug 2015 | 2 |
Sep 2015 | 2 |
Okt 2015 | 2 |
Nov 2015 | 2 |
You can do it like HIC demonstrated in
Creating Reference Dates for Intervals
Then you just need to add a master calendar:
INPUT:
LOAD *,
if(state = 'active', today(), modified_date) as end_date
INLINE [
id, state, created_date, modified_date
1, expired, 01/12/2014, 10/12/2014
2, expired, 01/12/2014, 10/03/2015
3, active, 01/12/2014 ,01/12/2014
4, expired, 10/01/2015, 12/01/2015
5, expired, 10/01/2015, 11/03/2015
6, active, 10/02/2015, 10/02/2015
7, expired, 10/03/2015, 11/03/2015
];
id_x_Dates:
Load id,
Date( created_date + IterNo() -1 ) as ReferenceDate
Resident INPUT
While IterNo() <= end_date - created_date + 1 ;
Load Date,Date as ReferenceDate, Month(Date) as Month, Monthname(Date) as YearMonth ;
Load Date(MinDate+iterno()) as Date While iterno() <= MaxDate - MinDate ;
Load Min(ReferenceDate)-1 as MinDate, Max(ReferenceDate) as MaxDate resident id_x_Dates;
Then create a straight table with dimension YearMonth and expression
=Count(Distinct id)
YearMonth | Count(Distinct id) |
---|---|
7 | |
Dez 2014 | 3 |
Jan 2015 | 4 |
Feb 2015 | 4 |
Mrz 2015 | 5 |
Apr 2015 | 2 |
Mai 2015 | 2 |
Jun 2015 | 2 |
Jul 2015 | 2 |
Aug 2015 | 2 |
Sep 2015 | 2 |
Okt 2015 | 2 |
Nov 2015 | 2 |
If you don't need the reference dates on a daily basis (e.g. you don't want to drill down to week or dates), you can even simplify the script to
INPUT:
LOAD *,
if(state = 'active', Date(today()), modified_date) as end_date
INLINE [
id, state, created_date, modified_date
1, expired, 01/12/2014, 10/12/2014
2, expired, 01/12/2014, 10/03/2015
3, active, 01/12/2014 ,01/12/2014
4, expired, 10/01/2015, 12/01/2015
5, expired, 10/01/2015, 11/03/2015
6, active, 10/02/2015, 10/02/2015
7, expired, 10/03/2015, 11/03/2015
];
id_x_Dates:
Load id,
MonthName(AddMonths( created_date, IterNo() -1 ) )as YearMonth
Resident INPUT
While MonthStart(created_date, IterNo() -1 ) <= Monthstart(end_date);
Thanks a lot. This is perfect!
Do you reckon there would be a performance hit when the number of records, average life span and date range grows?
Also wondering if this approach would work for incremental load (insert & updates).