*Image: Plaque commemorating Fisher on Inverforth House. Peter O’Connor via flickr.com, CC BY-SA 2.0*

Do you know Fisher’s method for combining *P*-values? If you do, move along; I’ve got nothing for you. If you don’t, though, you may be interested in what’s surely the most useful statistical test that – despite the fame of Fisher himself – nobody knows about.

Fisher’s method is the original meta-analysis. When I was a grad student, and nobody had heard of meta-analysis (or cell phones, or the internal-combustion engine), I had a supervisory committee member who liked to make strong statements. One of his favourites was “A bunch of weak tests don’t add up to a strong test!” Many of his strong opinions were right, but that one was wrong: of course they do! Combining weak tests to make a strong one is precisely the point of meta-analysis. It was Fisher’s method, when I showed it to him, that made my skeptical committee member eat his words. But despite the intervening decades in which meta-analysis has become mainstream, Fisher’s method still seems little-known. That’s a shame, because it’s easy and powerful.

Fisher’s method is a way to take multiple, independent tests of a null hypothesis and fuse them into a single strong test. Those multiple tests can be from different experiments, of course, but more usefully they can be tests of different predictions from the null. For example, the null hypothesis that *insecticide X has no effect on plant growth* can generate predictions suitable for *t*-tests (growth differs between treated vs. control plants), *G*-tests of association (treated vs. control plants differ in frequency of meeting a size threshold), or regression (plant growth rate changes with insecticide dosage). Sure, it’s unlikely that you’d run all these yourself – but you might easily find each in the literature, along with others you couldn’t have imagined running. Wouldn’t it be nice to be able to combine their independent results?

What makes Fisher’s method work is that no matter what the form of the prediction or the test applied to it, if the null hypothesis holds* then *P* values are distributed uniformly on [0,1]. Fisher’s insight was that this means twice the sum of the negative logs of those *P* values will have a χ^{2} distribution. More precisely, for *k *tests:

Here’s a simple worked example. Imagine four insecticide experiments:

- two
*t*-tests,*P*= 0.11 and*P*= 0.12 - a
*G*-test,*P*= 0.21 - a regression,
*P*= 0.08

Nothing significant, right? Wrong. Fisher’s method gives a test statistic of 16.8, with 8 degrees of freedom and a combined *P* = 0.03. This shouldn’t shock you: while none of the individual tests have *P* below the (absolutist) threshold of 0.05, it’s unlikely that four experiments would get four smallish values in the absence of any real effect.

So it’s dead easy: find a bunch of published experiments testing the hypothesis, pick out their *P* values, run a very simple calculation, and bam – meta-analysis on the cheap!

Now, Fisher’s method isn’t a panacea. It’s just as vulnerable to file-drawer effects as any other meta-analysis. And its focus on *P-*values is a shortcoming as well as an advantage. Fisher’s method offers no way to combine *effect sizes*. Of course, it’s difficult to see how you *could* combine effect sizes across experiments that measure group differences, frequency differences, and slopes – awkward situations like this were exactly Fisher’s motivation. Therefore, a significant result in Fisher’s method means you’ve detected a credible pattern, but not that you’ve measured its strength. That’s fine, since we should always be remembering that a significant *P* value is the beginning, not the end, of an analysis.

So more people should know about Fisher’s method. Perhaps it’s unbearably nerdy to have a favourite statistical test; but Fisher’s method is a strong contender as mine. And when have I (blogging or otherwise) worried about being too nerdy?

*© Stephen Heard (*sheard@unb.ca*) June 7, 2016*

*^And if all test assumptions are met, of course. Fisher’s method is powerful; it isn’t magic.

jeffollertonVery interesting and, agreed, it’s little known – or at least certainly a new one for me! Do you have the original citation for the test?

LikeLike

ScientistSeesSquirrelPost authorJeff- it comes from Fisher’s 1925

Statistical Methods for Research Workers, but I have to admit I haven’t located it in that text. Fisher expanded on it in a 1948 question-and-answer inThe American Statistician, which you can find here on JSTOR: https://www.jstor.org/stable/2681650?seq=1#page_scan_tab_contents.LikeLiked by 1 person

jeffollertonThanks!

LikeLike

alexmikovYour right.. Its Nerdy to have a favourite statistical test… but then again I keep a copy of Sokal and Rolf, Biometry by my bed. This test was taught to me and used a lot at uni in 1984/85 but i was unaware of its history. Thanks.

LikeLike

Nick CollinsProbably should mention the condition that the p values must be independently determined. That is, you shouldn’t combine the p values for your 3 plant responses if the measurements were carried out on the same set of plants. Otherwise you need to think about some sort of Bonferroni correction.

LikeLike

ScientistSeesSquirrelPost authorQuite right, Nick!. That’s what I meant in the 1st sentence, 3rd paragraph, by “multiple independent tests of a null”. I think your language, “p values independently determined”, means the same thing – right?

LikeLike

Andrew D. Steen (@drdrewsteen)Wow! This test gives me a way to use the ca. 1-gazillion independent, mostly >0.05 p-values I hadn’t figured out how to use in the paper I’m in the middle of writing up. Thanks much.

Your post inspired me to read more on methods for combining p-values. Whitlock (2005, J. Evol Bio) argues that a weighted-z test is more powerful than Fisher’s method when sample sizes are unequal. http://onlinelibrary.wiley.com/doi/10.1111/j.1420-9101.2005.00917.x/full

LikeLike

ScientistSeesSquirrelPost authorThat’s a good addition, thanks! I’m sure Mike is right – he usually is. 🙂

LikeLike

phaneron0I find the paper by Art Owen more comprehensive KARL PEARSON’S META-ANALYSIS REVISITED http://statweb.stanford.edu/~owen/reports/AOS697.pdf

By the way when I taught intro stats at Duke (2007/2008) the students seemed to have little difficulty undertaking and appreciating Fisher’s method (well at least appreciate why looking at individual p-values can be very mislead when multiple similar studies).

Keith O’Rourke

LikeLike