Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I am trying to import working hours data from csv file as shown below :
column a - employee number
column b - date
column c - time
column d - 100 stands for IN and 200 stands for OUT
IN order to calculate working hours per employee, i thought of creating two tables as shown below:
Table containing only IN's:
and second table containing the OUT's:
Note that beside morning and evening shifts (that starts and end within the same day) , there are unique types of working shifts as follows:
Double Shift - as shown below (in yellow) first shift started at between 07:00 - 10:00 and second shift between 14:00 - 18:00 (start and end may change)
Night Shift - usually starts at between 22:00-23:00 and ends at 06:00-07:00 the next day.
Any ideas on how to calculate working hours?
Many thanks
Sharbel
here is one solution. i separated the source tables and the transformed table so you can see them at the same time but you can optimize this for sure. you need to transform your times into the acceptable format - that should be easy enough. you then inner join them by employee
then filter out the records where time out < time in
then group by emp and timein and get the min of the timeout.
empIN:
load *, date(date(date_in) + Timestamp(time_in, 'M/D/YYYY hh:mm:ss[.fff]'),'M/D/YYYY hh:mm:ss[.fff]') as date_time_in inline [
emp_in, date_in, time_in
1995, 1/5/2023, 19:08:02
1995, 1/4/2023, 09:08:02
];
empOUT:
load *, date(date(date_out) + Timestamp(time_out, 'M/D/YYYY hh:mm:ss[.fff]'),'M/D/YYYY hh:mm:ss[.fff]') as date_time_out inline [
emp_out, date_out, time_out
1995, 1/6/2023, 07:08:02
1995, 1/4/2023, 23:08:02
];
NoConcatenate
tmp:
load
emp_in as emp, date_time_in as datetimein
Resident
empIN;
inner join (tmp)
load
emp_out as emp, date_time_out as datetimeout
resident
empOUT;
NoConcatenate
EMP:
load
emp, datetimein,
date(min(datetimeout),'M/D/YYYY hh:mm:ss[.fff]') as datetimeout
Resident
tmp
where datetimeout>=datetimein
group by emp, datetimein
;
drop table tmp;
here is how it looks:
result is in hours, you can always floor it after deciding if you will show your duration in hrs, mins, etc
i should have added that the actual in and out should be date time format so the load does that first. this way it address concerns where in and out are not on the same DAY
here is one solution. i separated the source tables and the transformed table so you can see them at the same time but you can optimize this for sure. you need to transform your times into the acceptable format - that should be easy enough. you then inner join them by employee
then filter out the records where time out < time in
then group by emp and timein and get the min of the timeout.
empIN:
load *, date(date(date_in) + Timestamp(time_in, 'M/D/YYYY hh:mm:ss[.fff]'),'M/D/YYYY hh:mm:ss[.fff]') as date_time_in inline [
emp_in, date_in, time_in
1995, 1/5/2023, 19:08:02
1995, 1/4/2023, 09:08:02
];
empOUT:
load *, date(date(date_out) + Timestamp(time_out, 'M/D/YYYY hh:mm:ss[.fff]'),'M/D/YYYY hh:mm:ss[.fff]') as date_time_out inline [
emp_out, date_out, time_out
1995, 1/6/2023, 07:08:02
1995, 1/4/2023, 23:08:02
];
NoConcatenate
tmp:
load
emp_in as emp, date_time_in as datetimein
Resident
empIN;
inner join (tmp)
load
emp_out as emp, date_time_out as datetimeout
resident
empOUT;
NoConcatenate
EMP:
load
emp, datetimein,
date(min(datetimeout),'M/D/YYYY hh:mm:ss[.fff]') as datetimeout
Resident
tmp
where datetimeout>=datetimein
group by emp, datetimein
;
drop table tmp;
here is how it looks:
result is in hours, you can always floor it after deciding if you will show your duration in hrs, mins, etc
i should have added that the actual in and out should be date time format so the load does that first. this way it address concerns where in and out are not on the same DAY
Works Perfect !
Thanks a lot