Thinking about this more, I would be interested in getting your views on the easiest model for holding the data. I can obviously have the central fact table as listed above, but is this the best format for querying the data?
I assume I would have a date dimension linking to the data . . . but should I be splitting the data? And then what is the easiest way of querying if an end date falls within a boundary, say May-2011? I am a little lost here.
I have posted this simplistic model as I would really appreciate your guidance - there being no immediate help available in Dubai. I have captured the staff listing and DOJ(oining) and DOL(eaving) and their departments, but I am finding that with this model, all questions that are being asked, need very convoluted calculation steps.
As such, I believe that transform / load steps may be missing something.
Simple questions like department size per month, staff turnover per month, rolling average, staff stability, all need multiple calculated dimensions.
I am sure that I am just seeing this through beginner eyes and am missing the fundamentals.
In particular, have a look at the data model in the example I posted there (reposted below). There are lots of notes in the script and some on the tab explaining what's going on. I suspect you'll be able to adapt that data model to your situation fairly readily.
I don't actually know the definition of turnover, but if it's a count of hires and fires, just add fire information to my extra table the way I added hire information to it (you probably want either the flag or the fired employee, but not both). Your breakdowns by department are like my breakdowns by gender. Department size for a month is a simple count of distinct employees. Not sure what rolling average you want, or how exactly you're defining staff stability and other things you need. Still, perhaps with the right data model, you'll be able to figure these things out.