Feb 27, 2022 5 min read

Are your job applicants representative of the labour market? A statistical way to check

Use binomial testing to figure out whether your job applicant pool is diverse enough.

Here are two surprising statistics:

According to a global software developer survey in 2021 (n = 82,286), most developers are male, accounting for 91.7% of all respondents.
In India, only about 19% of women participate in the labour force. This is significantly lower than the world average of 46%.

Now imagine that you’re trying to recruit a software developer for an Indian non-profit, and you receive over a hundred applications. Most are from men, but a significant number are from women.

You’ll need to answer one key question because you know that your executive director will ask: Did enough women apply for the position? Or put another way, is the pool of candidates who applied for the position representative of the labour force in this particular market?

Given you’re aware of the two stats above, you know that this will be a tricky question to answer. So, you get googling: what is the gender split within software development in India? You find this source from HackerRank, which tells you that, because of India’s more equitable education and tech industry culture, about 23% of Indian software developers are female (as opposed to 14.8% in the US and 10.3% in the UK, for example).

Ok, case closed then, right? All you need to do is ensure that at least 23% of applications are from female candidates. Just do an advertising blitz in female software development online communities, push the job ad to some of your female friends who are in software development, and Bob’s your uncle.

So, the day comes. Applications have now closed.

You log in to check how many you received.

106 applications. Not bad.

Quick counting reveals that 15 applications are from female applicants.

which is approximately…

14% of all applications.

You calculate it again because maybe you made a mistake.

14.15%.

Hmmmmm, that’s less than the 23% you were aiming for. But isn’t it possible that 14% of applications are from women purely by random chance? Like, if you have a jar with 80 red marbles and 20 blue marbles, and you grab a random bunch of marbles, the ratio of marbles you grab isn’t always going to be split exactly 80:20 red to blue. Sometimes it’ll be 85:15, or 75:25. You could even grab all 20 blue marbles by pure chance. It’s unlikely, but it’s possible.

Can we apply the same logic to the gender split in your pool of applications? If exactly 23% of applications were from female candidates, that would’ve been 24 applications. But surely there’s an acceptable range? Surely, it’s possible that your process and advertising blitz were sound, but you got less than 24 applications by pure random chance, right?

Binomial testing

Fortunately, there is a statistically rigorous way to check this. It’s called a two-tailed binomial hypothesis test, which you can easily plug into ChatGPT or Gemini to learn more about. For this task, we will skip all the technical details.

You will need to know:

The total number of applicants (n)
The number of applicants who belong to the category you’re interested in — in this case, female applicants (k)
The expected proportion of candidates from the category of interest (r)

You can find out the first two parameters (n and k) from your pool of applications; you’ll need to look up r (in our example, it’s 23%, according to the HackerRank source above).

Now, you can use a Google Sheets formula to handle the calculation. What we’re basically doing is calculating the probability that, by random chance, we get k candidates given the expected proportion r. This probability is known as the p-value.

🔧

Quick sidenote to explain what p-values are. Imagine you're flipping a coin. Normally, you expect heads and tails to come up about equally often. In other words, your starting assumption is that neither result is more favoured. Now, you flip the coin 10 times and get all heads. That's surprising, right? How likely is it that this random chance event (all heads) happened if your starting assumption (heads and tails are equally likely) is true? A p-value is precisely that: the chance of getting a result as extreme as yours, or even more extreme, if the starting assumption is actually true. In this case, the p-value would be very small (almost impossible) because getting all heads with a fair coin is unlikely.

The p-value tells us how likely it is that, if we picked 106 Indian software developers at random, 15 of them (i.e. 14%) would be female. As a rule of thumb, a p-value less than 5% is considered low. If the p-value is less than 5%, it’s unlikely that this scenario happened by random chance. In other words, it’s much more likely that something was awry, so we should re-advertise the position.

So, let’s crunch the numbers. How unlikely are we to receive 15 female applicants out of a total of 106, if 23% of potential applicants out there in the world are female and assuming our application process is unbiased?

To answer this, you can plug the parameters above into the following Google Sheets formula:

=MIN(BINOM.DIST(k, n, r, TRUE), 1 - BINOM.DIST(k, n, r, TRUE)) * 2

If we use this formula to work through our example, we get:

=MIN(BINOM.DIST(15, 106, 0.23, TRUE), 1 - BINOM.DIST(15, 106, 0.23, TRUE)) * 2
=0.03292571445 ≈ 3.29%

Here, the p-value is 3.29%, meaning there’s only a 3.29% chance that 15 out of 106 applicants would be women purely as a result of random chance. This is low. We should conclude that our application process was biased against women in some way.

Continuing with the same example: how many female candidates would’ve been enough for us to reach the opposite conclusion, i.e. that the gender proportions in the pool of applicants were not out of the ordinary? You can plug different values of k into the formula above to generate the necessary p-value that is greater than 5%. In this case, the p-value is greater than 5% if k is greater than 16.

=BINOM.DIST(16, 106, 0.23, TRUE) * 2

=0.06044113672 ≈ 6.04%

So, if 16 out of 106 applications had been from female applicants, that would’ve been fine.

When you re-advertise the position to get more female applicants, you should re-do the analysis using the new parameters. Say you readvertised and received 10 more applications, of which 5 are from female applicants. Given this, your parameters become:

n = 116
k = 20
r = 0.23

Given these parameters, you’re now fine and can safely assume that enough female applicants applied to the position. You should try to run the analysis using these new parameters and try to explain this new conclusion by referring to the new p-value. Imagine your Executive Director is asking you.

The final thing I’ll say is that it's also crucial to remember that statistics are only one piece of the puzzle. You should always consider other aspects, such as the content of job advertisements and the inclusiveness of your workplace culture and policies (e.g. parental leave), to attract and retain a diverse workforce.