Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi - I am trying to analyse the duration between starting and ending times. This is all around measuring the duration of a web session.
I have loaded data and its created the following table which I want to work with:
USER_SESSIONS:
LOAD
USER_ID,
SESSION_ID,
Year(Timestamp#(DATE_CREATED,'')) as [Session Year],
Date(Floor(DATE_CREATED), 'DD/MM/YYYY') as [Session Date],
Time(Time(DATE_CREATED,''),'hh:mm:ss') as [Session Time],
MonthName(Floor(DATE_CREATED)) as [Session Month],
Dual('Q' & Ceil(Month([DATE_CREATED])/3),QuarterStart([DATE_CREATED])) as [Session Quarter],
WeekName(Timestamp#(DATE_CREATED,'')) as [Session Week],
KEY_DATA,
EVENT_TYPE,
DESCRIPTION
I am looking to take the first and the last record related to each SESSION_ID to then determine the duration into a differing table e.g.
ID,Date,SessionStartTime,SessionEndTime,Duration
-2ekF7TyywRVZRgMwAz9aXk, 31/01/2017, 11:24:32,11:34:32,00:10:00
How would I go about doing this?
If the columns were in the initial file I would probably just select the MIN and MAX of the session_id but not sure how I would do it when i am deriving the columns required as part of the load?
thanks
Martin
Hi Martin,
If your table contains a field of DATE_CREATED in format [MM/DD/YYYY hh:mm:ss], why do not you want to use the timestamp? For example
USER_SESSIONS:
LOAD
USER_ID,
SESSION_ID,
Year(Timestamp#(DATE_CREATED,'')) as [Session Year],
Date(Floor(DATE_CREATED), 'DD/MM/YYYY') as [Session Date],
Time(Time(DATE_CREATED,''),'hh:mm:ss') as [Session Time],
MonthName(Floor(DATE_CREATED)) as [Session Month],
Dual('Q' & Ceil(Month([DATE_CREATED])/3),QuarterStart([DATE_CREATED])) as [Session Quarter],
WeekName(Timestamp#(DATE_CREATED,'')) as [Session Week],
Timestamp(DATE_CREATED) as [Session Timestamp],
KEY_DATA,
EVENT_TYPE,
DESCRIPTION
.....
Then the beginning of the session will be Min([Session Timestamp]), the end of the session will be Max([Session Timestamp]),
duration - Max([Session Timestamp]) - Min([Session Timestamp]).
Regards,
Andrey
Hi Andrey
I already do have the session time in the line:
Time(Time(DATE_CREATED,''),'hh:mm:ss') as [Session Time]
Do you suggest running another load across the same data source and selecting MIN and MAX records associated to each unique SESSION_ID?
My data looks like the following:
SESSION_ID | Session Time |
-2ekF7TyywRVZRgMwAz9aXk | 11:24:32 |
-2ekF7TyywRVZRgMwAz9aXk | 11:26:52 |
-2ekF7TyywRVZRgMwAz9aXk | 11:30:24 |
-2ekF7TyywRVZRgMwAz9aXk | 11:31:41 |
-2ekF7TyywRVZRgMwAz9aXk | 11:32:54 |
-2ekF7TyywRVZRgMwAz9aXk | 11:34:00 |
-2ekF7TyywRVZRgMwAz9aXk | 11:34:32 |
I still want to retain all other records in the load and not drop any.
So ideally i would have another table that looks like the following:
SESSION_ID | Session Start | Session End | Duration |
-2ekF7TyywRVZRgMwAz9aXk | 11:24:32 | 11:34:42 | 00:10:00 |
Hi Andrey
Thanks for making me think about this.
I have actually come up with this which i think works, i just have to calculate the duration between the 2 times and i think i will be good to go.
MIN_MAX:
LOAD
SESSION_ID,
SESSION_ID AS ID,
Year(Timestamp#(DATE_CREATED,'')) as [Year],
Date(Floor(DATE_CREATED), 'DD/MM/YYYY') as [Date],
Time(Time(DATE_CREATED,''),'hh:mm:ss') as [Time],
MonthName(Floor(DATE_CREATED)) as [Month],
Dual('Q' & Ceil(Month([DATE_CREATED])/3),QuarterStart([DATE_CREATED])) as [Quarter],
WeekName(Timestamp#(DATE_CREATED,'')) as [Week]
FROM [lib://DataDirectory/eir_audit_log_2017.csv]
(txt, codepage is 1252, embedded labels, delimiter is ',', msq);
Right Join (MIN_MAX)
LOAD ID,
Max(Time(Time(Time,''))) as EndTime,
Min(Time(Time(Time,''))) as StartTime
Resident MIN_MAX
Group by ID;