Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
datanibbler
Champion
Champion

Statistical question


Hi,

I have a new app on personell data (fluctuation in particular)

and currently I display the MEDIAN of the nr. of days that employees are with us (from their joining_date to their leaving_date or till today, whatever is earlier)

<=> That is quite high. My assumption, however, is that there is a percentage - might be like 30% or so - of employees

         who stay with us only for a very short time.

=> Can I use another statistical method to show that - taking, for instance, the 50% below the MEDIAN and calculating another

      MEDIAN for them or something like it? Or is there something like the MEDIAN where 25% are below and 75% above?

Thanks a lot!

Best regards,

DataNibbler

1 Solution

Accepted Solutions
Gysbert_Wassenaar

Yes, it's called a fractile for which you can use the fractile() function


talk is cheap, supply exceeds demand

View solution in original post

7 Replies
Gysbert_Wassenaar

Yes, it's called a fractile for which you can use the fractile() function


talk is cheap, supply exceeds demand
datanibbler
Champion
Champion
Author

Hi,

I have made it to a point - but that approach won't take me any further:

=> I can use set_analysis to take the 50% of employees whose time_with_the_company is below the MEDIAN and

     then calculate another MEDIAN - so then I have a line with only 25% of personnel below.

<=> I cannot take that any further because that would then require me to nest one set_expression inside another and

        I guess that won't work.

=> So I need a different statistical figure, something like the MEDIAN with only 10% below or so - what I am trying to

      get at (my assumption) is that the "lowest" 10% or so of employees have a time_with_the_company of only

      roundabout 30 days.

Best regards,

DataNibbler

datanibbler
Champion
Champion
Author

Hi Gysbert,

many thanks!

Would you mind helping me with the syntax here - the help_file is definitely not one of the USPs of QlikView - I already hate the fact that often a whole bunch of functions is on one page, and then the explanations are in a mixture of German and English (we have the whole thing set to German of course) and not always overly helpful 😉

I guess it would be something like

>> fractile(time_in_company, 0.1) <<

Right? (I want to separate the "lowest" 10% of employees rgd. that parameter).

I can then do what I have already tried with the MEDIAN and calculate the average time_with_us among those 10% whose time_with_us is lower than that fractile, no?

datanibbler
Champion
Champion
Author

Hi Gysbert,

I just talked to my supervisor and he thinks the question should be slightly different:

=> We should not be asking "How many days are the 'lowest' 10% of emps with us?",

<=> but rather "How many % of emps are with us for a time shorter than X days?"

- which is actually just the same thing turned around.

=> I guess it would be easy to have a variable and implement a slider_object or so to manipulate the kind of fractile,

     but is it also possible to do it the other way round?

Thanks a lot!

Best regards,

DataNibbler

P.S.: Oh - I'm just thinking, I recently read a blog on the CLASS function, to categorize the dataset into buckets - couldn't I use that to identify the emps in, e.g.., the first (smallest) 30d-bucket?

I don't know whether that would make what I'm aiming at any easier ...

Gysbert_Wassenaar

Maybe this blog post gives you some ideas: www.qlikfix.com/2010/10/08/decile-analysis/

"How many % of emps are with us for a time shorter than X days?"

That sounds more like count({<TimeWithUs={'<$(vNoOfDays)'}>} distinct EmployeeID)/count(total distinct EmployeeID). You would need to have a field TimeWithUs field. But you should be able to calculate that in the script if you don't have such a field already.


talk is cheap, supply exceeds demand
datanibbler
Champion
Champion
Author

Perfect!

Thanks a lot!

Once again, the solution is actually something I should have known about, there is nothing extraordinary about it - I just didn't think of it because I thought the solution must be a more complicated one 😉

That has always been a problem of mine - I can usually think out complicated solutions, but once I start down that road, I often miss alternative easy ones 😉

Thanks!

Best regards,

DataNibbler

Not applicable

Maybe I'm just reading this the wrong way, but the way I'm looking at this, it won't give you the information you want.  The original post says each employee is assigned a tenor based on the time between hire date and either when he quit, or now, whichever is lower.  So, if the company hired 30 people last week, the tenor of those employees would be only one week.  They could be very happy and productive and be looking forward to next week's work, but in this quartile framework they would be grouped in with people who quit immediately for whatever reason.  Perhaps there is not enough employment transition that it matters here, but for a large organization, it could seriously impact the interpretation of the data.