Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello, I wonder if anyone can help? In the load script I need to calculate the number of absences per employee that have occurred within 1 year of a new absence. So, if an employee had an absence today, and they have had 2 others absences within 1 year of the new absence, the number of absences for the year to today would be 3. Example of the output I am looking for:
Employee Ref |
Absence Start Date |
Absences with 12 months |
1203 |
01/05/2018 |
1 |
1203 |
06/06/2018 |
2 |
1203 |
19/06/2018 |
3 |
1203 |
14/11/2018 |
4 |
1203 |
05/05/2019 |
4 |
1203 |
08/07/2019 |
3 |
1203 |
05/11/2019 |
4 |
1203 |
01/12/2019 |
4 |
1203 |
03/01/2020 |
5 |
1203 |
05/03/2020 |
5 |
1203 |
14/11/2020 |
4 |
1203 |
16/01/2021 |
3 |
Is this possible. I have tried a few approaches!
it is possible to do a loop but my preference is to perform table operations which is faster if your volume grows:
SET DateFormat='DD/MM/YYYY';
data:
load * inline [
Employee Ref,Absence Start Date
1203,01/05/2018
1203,06/06/2018
1203,19/06/2018
1203,14/11/2018
1203,05/05/2019
1203,08/07/2019
1203,05/11/2019
1203,01/12/2019
1203,03/01/2020
1203,05/03/2020
1203,14/11/2020
1203,16/01/2021
];
//the following will create full outer join to allow determination of which dates are within 1 year
NoConcatenate
tmpData:
load *
resident data;
inner join (tmpData)
load [Employee Ref],[Absence Start Date] as Date
Resident tmpData;
//this filters out dates outside of 1 year
NoConcatenate
newData:
load [Employee Ref],[Absence Start Date], Date
Resident tmpData
where Date<=[Absence Start Date] and Date >=AddYears([Absence Start Date],-1)
;
//this aggregates the dates and gets the count
aggrData:
load [Employee Ref],[Absence Start Date], count(Date) as [Absences with 12 months]
resident newData
group by [Employee Ref],[Absence Start Date];
//add it back to the original data
inner join (data)
load [Employee Ref],[Absence Start Date], [Absences with 12 months]
Resident aggrData;
drop table aggrData, newData, tmpData;
it is possible to do a loop but my preference is to perform table operations which is faster if your volume grows:
SET DateFormat='DD/MM/YYYY';
data:
load * inline [
Employee Ref,Absence Start Date
1203,01/05/2018
1203,06/06/2018
1203,19/06/2018
1203,14/11/2018
1203,05/05/2019
1203,08/07/2019
1203,05/11/2019
1203,01/12/2019
1203,03/01/2020
1203,05/03/2020
1203,14/11/2020
1203,16/01/2021
];
//the following will create full outer join to allow determination of which dates are within 1 year
NoConcatenate
tmpData:
load *
resident data;
inner join (tmpData)
load [Employee Ref],[Absence Start Date] as Date
Resident tmpData;
//this filters out dates outside of 1 year
NoConcatenate
newData:
load [Employee Ref],[Absence Start Date], Date
Resident tmpData
where Date<=[Absence Start Date] and Date >=AddYears([Absence Start Date],-1)
;
//this aggregates the dates and gets the count
aggrData:
load [Employee Ref],[Absence Start Date], count(Date) as [Absences with 12 months]
resident newData
group by [Employee Ref],[Absence Start Date];
//add it back to the original data
inner join (data)
load [Employee Ref],[Absence Start Date], [Absences with 12 months]
Resident aggrData;
drop table aggrData, newData, tmpData;
hope that helps
Great thank you. It works as really well and is so obvious when you know how!!
yw