Image: Plaque commemorating Fisher on Inverforth House. Peter O’Connor via flickr.com, CC BY-SA 2.0
Do you know Fisher’s method for combining P-values? If you do, move along; I’ve got nothing for you. If you don’t, though, you may be interested in what’s surely the most useful statistical test that – despite the fame of Fisher himself – nobody knows about.
Fisher’s method is the original meta-analysis. When I was a grad student, and nobody had heard of meta-analysis (or cell phones, or the internal-combustion engine), I had a supervisory committee member who liked to make strong statements. One of his favourites was “A bunch of weak tests don’t add up to a strong test!” Many of his strong opinions were right, but that one was wrong: of course they do! Combining weak tests to make a strong one is precisely the point of meta-analysis. It was Fisher’s method, when I showed it to him, that made my skeptical committee member eat his words. But despite the intervening decades in which meta-analysis has become mainstream, Fisher’s method still seems little-known. That’s a shame, because it’s easy and powerful.
Fisher’s method is a way to take multiple, independent tests of a null hypothesis and fuse them into a single strong test. Those multiple tests can be from different experiments, of course, but more usefully they can be tests of different predictions from the null. For example, the null hypothesis that insecticide X has no effect on plant growth can generate predictions suitable for t-tests (growth differs between treated vs. control plants), G-tests of association (treated vs. control plants differ in frequency of meeting a size threshold), or regression (plant growth rate changes with insecticide dosage). Sure, it’s unlikely that you’d run all these yourself – but you might easily find each in the literature, along with others you couldn’t have imagined running. Wouldn’t it be nice to be able to combine their independent results?
What makes Fisher’s method work is that no matter what the form of the prediction or the test applied to it, if the null hypothesis holds* then P values are distributed uniformly on [0,1]. Fisher’s insight was that this means twice the sum of the negative logs of those P values will have a χ2 distribution. More precisely, for k tests:
Here’s a simple worked example. Imagine four insecticide experiments:
- two t-tests, P = 0.11 and P = 0.12
- a G-test, P = 0.21
- a regression, P = 0.08
Nothing significant, right? Wrong. Fisher’s method gives a test statistic of 16.8, with 8 degrees of freedom and a combined P = 0.03. This shouldn’t shock you: while none of the individual tests have P below the (absolutist) threshold of 0.05, it’s unlikely that four experiments would get four smallish values in the absence of any real effect.
So it’s dead easy: find a bunch of published experiments testing the hypothesis, pick out their P values, run a very simple calculation, and bam – meta-analysis on the cheap!
Now, Fisher’s method isn’t a panacea. It’s just as vulnerable to file-drawer effects as any other meta-analysis. And its focus on P-values is a shortcoming as well as an advantage. Fisher’s method offers no way to combine effect sizes. Of course, it’s difficult to see how you could combine effect sizes across experiments that measure group differences, frequency differences, and slopes – awkward situations like this were exactly Fisher’s motivation. Therefore, a significant result in Fisher’s method means you’ve detected a credible pattern, but not that you’ve measured its strength. That’s fine, since we should always be remembering that a significant P value is the beginning, not the end, of an analysis.
So more people should know about Fisher’s method. Perhaps it’s unbearably nerdy to have a favourite statistical test; but Fisher’s method is a strong contender as mine. And when have I (blogging or otherwise) worried about being too nerdy?
© Stephen Heard (email@example.com) June 7, 2016
*^And if all test assumptions are met, of course. Fisher’s method is powerful; it isn’t magic.