Comic: xkcd #892, by Randall Munroe
For some reason, people seem to love taking shots at null-hypothesis/significance-testing statistics, despite its central place in the logic of scientific inference. This is part of a bigger pattern, I think: it’s fun to be iconoclastic, and the more foundational the icon you’re clasting (yes, I know that’s not really a word), the more fun it is. So the P-value takes more than its share of drubbing, as do decision rules associated with it. The null hypothesis may be the most foundational of all, and sure enough, it also takes abuse.
I hear two complaints about null hypotheses – and I’ve been hearing the same two since I was a grad student. That’s mumble-mumble years listening to the same strange but unkillable misconceptions, and when both popped their heads up again within a week, I gave myself permission to rant about them a little bit. So here goes.
(1) “The null hypothesis is often uninteresting” (sometimes followed by “if it were true, we wouldn’t care about it”). Well, sure, of course the null is “uninteresting”. By definition, the null expresses the lack of pattern, and usually, it’s that pattern we’re really interested in. So we frame and test a null because rejecting it would be interesting.
It’s true enough that if we can’t reject, we may be disappointed. We’re human, and often emotionally invested in our ideas. But so what? It isn’t Nature’s responsibility to make itself interesting for us; and if one of our pet hypotheses turns out not to have compelling evidence behind it, well, so it goes. Don’t worry, something else cool will be along in a moment.
(2) “The null is never actually true, so rejecting it isn’t helpful”. This complaint is usually accompanied by a declaration that any two groups of things (let’s say) always differ somehow, at least a little, so distinguishing them is simply a matter of gathering a large enough sample size to get a significant P. Here’s an example among many: Andrew Gelman, as an aside in an otherwise excellent post about multiple regression (etc.): “I don’t think there are zero effects, so I think it’s just a mistake overall to be saying that some predictors matter and some don’t.”*
The objection that “the null is never actually true” is a strange one. It’s also a bit slippery, because I think it can be wrong in any of three ways. The first two seem like open-and-shut cases, stemming from clearcut misconceptions about how null hypothesis testing works. The third is much more interesting, and contains a striking claim about the nature of the universe. I’ll save the best for last.
- First, in some hands the objection seems to betray confusion about samples vs. populations. It’s true that even when two populations are identical, sampling from them will nearly always produce a small difference; and so if you make the mistake of p-hacking by checking “significance” of many repeated samplings, you’ll always eventually reject the null. You do need the p-hacking part, because of course even when two populations are very different, you’ll sometimes get two samples that are identical. Keep going, though, and if you reason this way your belief that ‘the null is never true’ will be sustained. When the null actually is true, you may have to keep going longer; but with a bit of patience you’ll always get two samples that “differ”. But of course this doesn’t make the null true (about the populations); it just makes you guilty of not understanding sampling.
- Second, the objection sometimes arises because people pay attention only to P values and neglect effect sizes. Let’s assume for the moment that large enough sample sizes always reveal differences between groups, not because of p-hacking but because the null actually isn’t ever true (more about this soon, in #3). Those effects that turn up for large n are tiny (otherwise the line about “large enough sample size” wouldn’t be there), and there’s no problem with declaring such an effect real but unimportant. To do so, we use the degree of the pattern’s departure from the null hypothesis – and when that degree of departure is very small, we may find the null hypothesis a useful representation of nature even though we find it false. Here, the degree to which we reject the null matters – as long as we’re paying attention.
Third, and most interestingly, sometimes the objection seems to be that there really aren’t any zero effects (that is, the null really is always false), and that these universally-real effects matter. But what would it mean if this were true? I think it would mean we were making a deep but completely unfounded claim about the nature of the universe. That claim is that all explanatory variables matter, that all possible causes exist. For each such possible cause, making this claim is a strong statement that you know the true nature of the universe – and that you know this without the need to gather evidence. Does fish body size respond to environmental phosphates? Yes. Does it respond to environmental silica? Yes. Does it respond to environmental xenon? Yes. Does it respond to the number of supernova remnants in the sector of sky defined by the Unicode number for the second letter of the fish’s Latin name? Yes**. This claim that all causes exist is a breathtaking one, and it seems to reduce science to an exercise in mensuration. But I’m not sure why its claimants stop there. If we know without the need for evidence that causes exist, why don’t we similarly know their magnitudes? How is it that we can reject (without evidence) a value of zero for an effect size, but no other value? If we claim that we know these things without evidence, do we need evidence to know that we know them? (Ouch.)
Now, I’ll forgive you if you think all of my arguments are silly caricatures. But they’re the arguments we’re inevitably pushed to, once we decide to pay serious attention to the claim that null hypotheses don’t matter. The universe is a complex place***, and unassisted humans are really good at misinterpreting it. Null hypotheses and our apparatus for testing them are something science needs. Could we please stop clasting this particular icon?
© Stephen Heard (firstname.lastname@example.org) April 3,, 2017
*^This is not at all to pick on Andrew Gelman, who is an excellent statistician and whose work is thought-provoking and invaluable – look further into his blog for a sampling. In fact, the objection turning up from someone whose work I respect so much, when I had this post half-written, really made me think.
**^You may object that I made this last hypothesis up to be deliberately ridiculous. You’re right, I did (and it was fun). You could respond with the claim that it’s only plausible causes that always exist. But then you’re making another deep and unfounded claim about the nature of the universe: that plausible hypotheses are always true, and implausible ones always false, and that we can distinguish plausible-and-thus-true hypotheses from implausible-and-thus-false ones – without (again) the need for evidence. I may have guided you gently toward this bottomless rabbithole, but I didn’t put it there.
***^Darn it, now I’m the one making deep claims about the nature of the universe without evidence.