Douglas Adams once famously wrote that the “answer to the ultimate question of life, the universe and everything is 42.”
I beg to differ. For many scientists, the answer is any number less than or equal to 0.05.
Why, you ask? The answer lies in p-values. If you’ve never worked heavily with statistics, you might think this is the lead up to a bathroom joke (“No, no, I mean, ‘P’ like the letter, not like THAT…”). What is a p-value, then? I offer you three definitions:
My Flippant Definition: The p-value is an arbitrary number that determines if a scientist gets paid.
Internet Definition: “The p-value, or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H0) of a study question is true – the definition of ‘extreme’ depends on how the hypothesis is being tested” (source).
This is similar to the definition I wrote in my lecture notes when the topic was first introduced to me, but it means very little unless you apply it. (I find this to be true for most math stuff.)
Before we move on to a more helpful description, here’s a quick overview on a couple of important terms I’ll be using:
Null Hypothesis: Every experiment has a “null hypothesis.” The null hypothesis states that there is no difference, no effect, or no pattern. It contrasts the “alternative hypothesis.” An important thing to take note of for this post is that p-values are determined in relation to the null hypothesis being true. This is because the null hypothesis is always assumed true unless significant results suggest otherwise.
Alternative Hypothesis: The alternative hypothesis describes the difference, effect, or pattern that the researcher is looking for. When a researcher begins their experiment, they are predicting that the alternative hypothesis is true.
My More Helpful Definition: The internet definition is technically correct, it just needs some explaining. Here’s an example. Let’s say I’m doing an experiment about M&M color distributions.* My null hypothesis is that there are equal proportions of each color in any given bag of M&Ms. My alternative hypothesis is that there are proportionally more red M&Ms. To test this, I randomly pick 20 M&Ms out of a bag of 100.
Let’s say that of my sample, 30% were brown, 20% were yellow, 20% were red, 10% were orange, 10% were green, and 10% were blue. I run a statistical test of significance on those numbers on a computer. The computer then gives me a p-value (among other things). In this case, my p-value is the probability that I would randomly pick a sample with 20% red M&Ms (or more), if the null hypothesis (that all the colors are equally represented) is true. If the p-value is large, we “accept the null hypothesis” because there is a high probably that any pattern that we are seeing is due to random chance. Another way to think of this is that the p-value helps tell the researcher how well the sample (i.e. a handful of 20 M&Ms) reflects an entire population (i.e. the entire bag).**
Considering this, the all-knowing science gods (I have no idea who actually made this decision— probably a committee) declared that if a p-value is less than or equal to 0.05 then results are considered significant, and the null hypothesis is rejected (whoopee!). A p-value of 0.05 means that there’s a 5% chance that any pattern found in the data is due to random chance, which is considered an acceptable risk.
For scientists, getting p-values that are smaller than 0.05 is incredibly important. My first definition of the p-value is only a little bit flip, because most journals will only accept papers that have “significant” results. So, p-values determine if your research gets published, which is career-defining, and does in fact affect how much a scientist gets paid.
*To my surprise (and delight), M&M has published their color distributions. See how they’ve changed over the last 40 years here!
** If you want another example, try this short Khan academy lesson on p-values and hypothesis testing.