Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi, I'm have a lot of trouble working out how to build a Sankey chart to show recruitment flow.
Essentially I've got groups of DEI data, and I want to see how the different categories progress through our recruitment.
So I want to see, lets say, Men, Women, Other, Prefer Not To Say as the first column, summed by number of 'applications'. Then going to another column that shows how many interviewed and how many offered and how many did not get through to interview as the second column.
The trouble is, my data is in the format:
GENDER, APPLICATIONS, INTERVIEWED, OFFERED
MEN, 10, 4, 2
WOMEN, 7, 4, 3
OTHER, 1, 0, 0
PREFER NOT TO SAY, 4, 4, 1
Or whatever (Those stats entirely made up). So my difficulty is working out the measure. I don't see a way to capture it where my first column that shows the gender split isn't also showing the total of all status's?
Hi Richard,
It looks to me like you can solve t his problem by normalizing the last three fields into one column - let's call it STATUS. You can do it by loading this data using a CROSSTABLE LOAD.
Once your data is normalized, you will have 3 columns: GENDER, STATUS, and COUNT. Then, you can use GENDER as the first dimension, STATUS as the second dimension, and sum(COUNT) as the measure.
Cheers,
Sorry, I mistakenly marked as solution because I hadn't thought to do crosstable, but, thinking about it, that is actually going to end up being the same result as what I have done which is to load them individually AS STATUS concatenating each time.
But, the issue is, the first column you describe here GENDER will show not the count for where STATUS is 'Applications', but will show the total of Applications, Interviews and Offers. (As the sum(COUNT) for Male is the total of them all, right?).
That's the bit I can't quite get my head around.
The intention is to show the flow from Applications (Split by Gender) how many went to Interview, how many to Offer, and how many did not.
I'm looking at doing funnel charts instead, because I think I can get that to work by doing a trellis chart of funnels, so a funnel per Gender.
Well, the essence of the Sankey is to show the flow of something, "from" and "to". So, if Gender is your first column and the Status is your second column, then the Sankey will show you the distribution both ways - for each Gender, you will see the distribution of counts by Status, and for each Status, you will see the distribution of counts by Gender.
To show the flow of applications from status to status, divided by Gender, I'd try to use a stacked area chart (a form of the Line Chart) with Status as the first dimension and Gender as the second dimension. You should see the counts for each Status (X-axis), divided by Gender (the stacked segments of the line/area chart). I think that should do a better job than several funnel charts in a trellis - unless you wanted to see a separate chart for each Gender anyway.
Cheers,