Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
I'm trying to relate two tables where one include months the other periods in a way that result shows number of days from a given period that fall into a given month. Intervalmatch on its own doesn't work here.
Input1
---------
Emp, Month
John, 1
David, 2
Input2
---------
Emp, StartDate, EndDate
John, 2011-01-08, 2011-01-18
David, 2011-01-28, 2011-02-07
Result
---------
Emp, Month, EndDate - StartDate
John, 1, 10
David, 1, 3
David, 1, 7
Regards,
Przemek
Hi Przemek
I was not sure if you were looking for day counts including or excluding the end date, but here is another way (for both)
EmployeeDates:
LOAD * Inline
[
Emp, StartDate, EndDate
John, 2011/01/08, 2011/01/18
David, 2011/01/28, 2011/02/07
];
//------------------------------------------------------------------------------------------
// Build calendar
//------------------------------------------------------------------------------------------
tmpRange:
LOAD Min(StartDate) AS MinStart,
Max(EndDate) AS MaxEnd
Resident EmployeeDates;
Let vMinDate = Peek('MinStart');
Let vMaxDate = Peek('MaxEnd');
Drop Table tmpRange;
Calendar:
LOAD
CalDate,
Month(CalDate) AS CalMonth,
Year(CalDate) AS CalYear;
LOAD
Date($(vMinDate) + IterNo() - 1) AS CalDate
AutoGenerate 1
While $(vMinDate) + IterNo() - 1 <= $(vMaxDate)
;
//------------------------------------------------------------------------------------------
// Join date range by employee to the calendar (one calendar per employee)
//------------------------------------------------------------------------------------------
Join (Calendar)
LOAD Distinct
Emp,
StartDate AS Cal.StartDate,
EndDate AS Cal.EndDate
Resident EmployeeDates;
//------------------------------------------------------------------------------------------
// EITHER --- Get the day count (exclusive of end date)
//------------------------------------------------------------------------------------------
Left Join (EmployeeDates)
LOAD Emp,
CalMonth AS EmpMonth,
Count(CalDate) - 1 AS DayCountEx
Resident Calendar
Where CalDate >= Cal.StartDate And CalDate <= Cal.EndDate And CalDate < MonthEnd(CalDate)
Group By Emp, CalMonth
;
//------------------------------------------------------------------------------------------
// OR --- Get the day count (inclusive of end date)
//------------------------------------------------------------------------------------------
Left Join (EmployeeDates)
LOAD Emp,
CalMonth AS EmpMonth,
Count(CalDate) AS DayCountInc
Resident Calendar
Where CalDate >= Cal.StartDate And CalDate <= Cal.EndDate And CalDate <= MonthEnd(CalDate)
Group By Emp, CalMonth
;
Hope that helps
Jonathan
Hi przemek,
I think you could use interval match if you were using a master calendar.
If you don't want to do that, I think this could calculate the days per month and Emp:
SET DateFormat='YYYY-MM-DD';
INPUT:
LOAD * INLINE [
Emp, StartDate, EndDate
John, 2011-01-08, 2011-01-18
David, 2011-01-28, 2011-02-07
Stefan, 2011-05-03, 2011-09-11
Thomas, 2011-01-01, 2011-12-31
];
OUTPUT:
LOAD Emp,
//StartDate, EndDate,
// IterNo(),
if(IterNo()=1,
if(month(EndDate)=month(StartDate), EndDate - StartDate, MonthEnd(StartDate) - StartDate-1),
if(IterNo() = (month(EndDate)-month(StartDate)+1), Day(EndDate),
Day(MonthEnd(AddMonths(StartDate,IterNo()-1)))))
as Days,
Month(StartDate) +IterNo()-1 as Month
Resident INPUT while IterNo() <= (month(EndDate)-month(StartDate)+1);
Then you maybe want to restrict the data to INPUT1, but I don't understood how month affected your result table here: In Input1, Month for David is 2 (February?), but then in your result table, David has two entries, both showing Month = 1 where I assumed you only want the second record with Month = 2.
Regards,
Stefan
Hi Przemek
I was not sure if you were looking for day counts including or excluding the end date, but here is another way (for both)
EmployeeDates:
LOAD * Inline
[
Emp, StartDate, EndDate
John, 2011/01/08, 2011/01/18
David, 2011/01/28, 2011/02/07
];
//------------------------------------------------------------------------------------------
// Build calendar
//------------------------------------------------------------------------------------------
tmpRange:
LOAD Min(StartDate) AS MinStart,
Max(EndDate) AS MaxEnd
Resident EmployeeDates;
Let vMinDate = Peek('MinStart');
Let vMaxDate = Peek('MaxEnd');
Drop Table tmpRange;
Calendar:
LOAD
CalDate,
Month(CalDate) AS CalMonth,
Year(CalDate) AS CalYear;
LOAD
Date($(vMinDate) + IterNo() - 1) AS CalDate
AutoGenerate 1
While $(vMinDate) + IterNo() - 1 <= $(vMaxDate)
;
//------------------------------------------------------------------------------------------
// Join date range by employee to the calendar (one calendar per employee)
//------------------------------------------------------------------------------------------
Join (Calendar)
LOAD Distinct
Emp,
StartDate AS Cal.StartDate,
EndDate AS Cal.EndDate
Resident EmployeeDates;
//------------------------------------------------------------------------------------------
// EITHER --- Get the day count (exclusive of end date)
//------------------------------------------------------------------------------------------
Left Join (EmployeeDates)
LOAD Emp,
CalMonth AS EmpMonth,
Count(CalDate) - 1 AS DayCountEx
Resident Calendar
Where CalDate >= Cal.StartDate And CalDate <= Cal.EndDate And CalDate < MonthEnd(CalDate)
Group By Emp, CalMonth
;
//------------------------------------------------------------------------------------------
// OR --- Get the day count (inclusive of end date)
//------------------------------------------------------------------------------------------
Left Join (EmployeeDates)
LOAD Emp,
CalMonth AS EmpMonth,
Count(CalDate) AS DayCountInc
Resident Calendar
Where CalDate >= Cal.StartDate And CalDate <= Cal.EndDate And CalDate <= MonthEnd(CalDate)
Group By Emp, CalMonth
;
Hope that helps
Jonathan
Hi Przemek
Combining Stefan's and my approach, I also came up with a simpler script:
StartEnd:
LOAD * Inline
[
Emp, StartDate, EndDate
John, 2011/01/08, 2011/02/01
David, 2011/01/28, 2011/02/07
];
Output:
LOAD Emp,
Count(DISTINCT CalDate) AS DayCountInc,
Count(DISTINCT CalDate) - 1 AS DayCountEx,
Month(CalDate) AS EmpMonth,
Year(CalDate) AS EmpYear
Group By Emp, Month(CalDate), Year(CalDate)
;
LOAD
Emp,
Date(StartDate + IterNo() - 1) AS CalDate
Resident StartEnd
While StartDate + IterNo() - 1 <= EndDate
;
Jonathan
Count(DISTINCT CalDate) AS DayCountInc,
Hi Jonathan,
nice scripting ![]()
I noticed a (possible) problem with your DayCountEx in both scripts: You are removing one day from the count for every month, so if the Emps time period span more than 2 months, the count will be incorrect (and for 2 month they are only correct if you want to exclude start and end date, which don't match the sample result. I assumed from the example results, that one want only the start date to be excluded from the count.)
Regards,
Stefan
Thanks a lot both of you, guys. Very helpful answers those were. I'm also now thinking of creating a little bit easier model related to the above one i.e.
Input
-------
Emp, StartDate, EndDate
John, 2011-01-31, 2011-02-01
David, 2011-02-08, 2011-02-08
Result
---------
Emp, CalendarDate
John, 2011-01-31
John, 2011-02-01
David, 2011-02-08
How would you implement the CalendarDate to represent each day from the period?
Regards,
Przemek
So you want a record for each day of the period (I don't see this in your Result table, but that's how I read your last sentence)?
Just reuse Jonathans last Load:
LOAD
Emp,
Date(StartDate + IterNo() - 1) AS CalDate
Resident Input
While StartDate + IterNo() - 1 <= EndDate
;
(Or think about a master calendar and interval match).
Bye,
Stefan
You did not specify that employees could have multiple entries in your original post
, bur here goes. The main thing is to identify which employee record is being processed, so I add a RowKey field as the first step. Pretty much unchanged after that. The script is now:
StartEnd:
LOAD * Inline
[
Emp, StartDate, EndDate
John, 2011/01/08, 2011/01/12
John, 2011/02/04, 2011/02/09
John, 2011/02/11, 2011/02/18
David, 2011/01/28, 2011/02/07
];
Join (StartEnd)
LOAD *, RowNo() AS RowKey Resident StartEnd;
Output:
LOAD Emp,
CalDate,
RowKey,
Month(CalDate) AS CalMonth,
Year(CalDate) AS CalYear
;
LOAD
Emp,
RowKey,
Date(StartDate + IterNo() - 1) AS CalDate
Resident StartEnd
While StartDate + IterNo() - 1 <= EndDate
;
Jonathan
Stefan
swuehl wrote:
I noticed a (possible) problem with your DayCountEx in both scripts: You are removing one day from the count for every month, so if the Emps time period span more than 2 months, the count will be incorrect (and for 2 month they are only correct if you want to exclude start and end date, which don't match the sample result. I assumed from the example results, that one want only the start date to be excluded from the count.)
You are quite right - I did not test this adequately. I added a check for the month of the CalDate and the month of the EndDate, and only when they are equal then subtract 1 from the count - that fixed the problem.
Jonathan
I've added StartDate and End Date to the final load and it does the same. Anyway, many thanks for the reply.