I would like to ask the experts here, what would the best approach be to solve this:
I have a file with about half a million records. Each task (record in the file) has two timestamps (task start, task end).
I would like to be able to tell, at any point in time, whether there was any activity (tasks in progress), how many unique tasks were in progress, total idle time (no tasks in progress), etc.
If you can point me to some reading material on how to do this, I will be very thankful!
Create one more filed where in you diduct task end time and task start time..Hope with this u can indetify the status..
Again to go specific ,u can put if condn to create tht extra columns something like:
If (taskend-taskstart>=so n so,'status1',if(follow other if condn to bring all status))
To get an idea about how to prepare for this kind of analysis, you should read the QV Desktop help article on the INTERVALMATCH script prefix. IIRC it includes an example with employee work start/stop times and manages to deduce stuff like "how many employees were at work at any time of day".
Thank you for these very helpful links!
I like the Creating Reference Dates for Intervals approach very much, however, I am concerned with the number of records I would need to generate.
I typically deal with a datasets of around 250,000 records (policies in HIC's example; radiology equipment activity logs in my case), covering 12 months. Each record contains two timestamps (Scan Start Time and Scan End Time, along with scanner identifier and many other, non-critical fields).
These timestamps need to be analyzed using 1-minute intervals.
Using HIC's approach, I would need to generate 250,000 * 365 * 1440 records in the "Policies_x_ Dates" table. That's 131,400,000,000 records! I think this is beyond my memory and CPU limitations.
Is there another, less memory and CPU intensive, way to figure out which machines were on at any point in time, total idle time, total busy time, etc. (all based on master calendar selections)?