7 Replies Latest reply: Jul 11, 2014 3:29 PM by David Taylor

# Statistical question

Hi,

I have a new app on personell data (fluctuation in particular)

and currently I display the MEDIAN of the nr. of days that employees are with us (from their joining_date to their leaving_date or till today, whatever is earlier)

<=> That is quite high. My assumption, however, is that there is a percentage - might be like 30% or so - of employees

who stay with us only for a very short time.

=> Can I use another statistical method to show that - taking, for instance, the 50% below the MEDIAN and calculating another

MEDIAN for them or something like it? Or is there something like the MEDIAN where 25% are below and 75% above?

Thanks a lot!

Best regards,

DataNibbler

• ###### Re: Statistical question

Yes, it's called a fractile for which you can use the fractile() function

• ###### Re: Statistical question

Hi Gysbert,

many thanks!

Would you mind helping me with the syntax here - the help_file is definitely not one of the USPs of QlikView - I already hate the fact that often a whole bunch of functions is on one page, and then the explanations are in a mixture of German and English (we have the whole thing set to German of course) and not always overly helpful ;-)

I guess it would be something like

>> fractile(time_in_company, 0.1) <<

Right? (I want to separate the "lowest" 10% of employees rgd. that parameter).

I can then do what I have already tried with the MEDIAN and calculate the average time_with_us among those 10% whose time_with_us is lower than that fractile, no?

• ###### Re: Statistical question

Hi Gysbert,

I just talked to my supervisor and he thinks the question should be slightly different:

=> We should not be asking "How many days are the 'lowest' 10% of emps with us?",

<=> but rather "How many % of emps are with us for a time shorter than X days?"

- which is actually just the same thing turned around.

=> I guess it would be easy to have a variable and implement a slider_object or so to manipulate the kind of fractile,

but is it also possible to do it the other way round?

Thanks a lot!

Best regards,

DataNibbler

P.S.: Oh - I'm just thinking, I recently read a blog on the CLASS function, to categorize the dataset into buckets - couldn't I use that to identify the emps in, e.g.., the first (smallest) 30d-bucket?

I don't know whether that would make what I'm aiming at any easier ...

• ###### Re: Statistical question

Maybe this blog post gives you some ideas: www.qlikfix.com/2010/10/08/decile-analysis/

"How many % of emps are with us for a time shorter than X days?"

That sounds more like count({<TimeWithUs={'<\$(vNoOfDays)'}>} distinct EmployeeID)/count(total distinct EmployeeID). You would need to have a field TimeWithUs field. But you should be able to calculate that in the script if you don't have such a field already.

• ###### Re: Statistical question

Perfect!

Thanks a lot!

Once again, the solution is actually something I should have known about, there is nothing extraordinary about it - I just didn't think of it because I thought the solution must be a more complicated one ;-)

That has always been a problem of mine - I can usually think out complicated solutions, but once I start down that road, I often miss alternative easy ones ;-)

Thanks!

Best regards,

DataNibbler

• ###### Re: Statistical question

Maybe I'm just reading this the wrong way, but the way I'm looking at this, it won't give you the information you want.  The original post says each employee is assigned a tenor based on the time between hire date and either when he quit, or now, whichever is lower.  So, if the company hired 30 people last week, the tenor of those employees would be only one week.  They could be very happy and productive and be looking forward to next week's work, but in this quartile framework they would be grouped in with people who quit immediately for whatever reason.  Perhaps there is not enough employment transition that it matters here, but for a large organization, it could seriously impact the interpretation of the data.

• ###### Re: Statistical question

Hi,

I have made it to a point - but that approach won't take me any further:

=> I can use set_analysis to take the 50% of employees whose time_with_the_company is below the MEDIAN and

then calculate another MEDIAN - so then I have a line with only 25% of personnel below.

<=> I cannot take that any further because that would then require me to nest one set_expression inside another and

I guess that won't work.

=> So I need a different statistical figure, something like the MEDIAN with only 10% below or so - what I am trying to

get at (my assumption) is that the "lowest" 10% or so of employees have a time_with_the_company of only