I made scones this morning, and it made me think about statistics, and about thinking. No, really, I have a point: it’s that P = 0.05 and a teaspoon of baking powder are the same thing, in an important way. Am I stretching an analogy to its breaking point? Read on to find out.
My scone recipe calls for 4 cups of flour, a cup of sugar, a teaspoon of baking powder, a teaspoon of baking soda, half a teaspoon of salt, four tablespoons of butter or shortening, and then raisins and buttermilk to make a dough.* The quantities are interesting. Continue reading
Image: Skinny-leg jeans. Not my legs. Or my jeans. © Claude Truong-Ngoc CC BY-SA 3.0, via wikimedia.org
I went shopping for jeans last week, and came home frustrated. (As usual, yes, I’m eventually heading somewhere.) I have calves of considerable circumference, and the fashion in men’s jeans now seems to be for a very narrow-cut leg. I took pair after pair into the fitting room, only to discover I couldn’t even force my leg through the available hole. I know, hold the presses – I’m old and I don’t like today’s fashion; and while we’re at it, all you kids get off my lawn!
But from my (admittedly weird) utilitarian point of view, I just don’t understand skinny-leg jeans. Here’s why. If you make a pair of skinny-leg jeans, they can be used by a skinny-leg person, but not – not even a little bit – by a non-skinny-leg person. If you make a pair of wide-leg jeans, they accommodate both. There’s a fundamental asymmetry in usefulness that makes it seem obvious, to me, how jeans ought to be sewn.
The same asymmetry is why I teach students to report exact P-values, not just “P<0.05” or “P>0.05”.* Continue reading
Image: William Caxton showing his printing press to King Edward IV and Queen Elizabeth (public domain)
It’s a phrase that gets no respect: “nearly significant”. Horrified tweets, tittering, and all the rest – a remarkably large number of people are convinced that when someone finds P = 0.06 and utters the phrase “nearly significant”, it betrays that person’s complete lack of statistical knowledge. Or maybe of ethics. It’s not true, of course. It’s a perfectly reasonable philosophy to interpret P-values as continuous metrics of evidence* rather than as lines in the sand that are either crossed or not. But today I’m not concerned with the philosophical justification for the two interpretations of P values – if you want more about that, there’s my older post, or for a broader and much more authoritative treatment, there’s Deborah Mayo’s recent book (well worth reading for this and other reasons). Instead, I’m going to offer a non-philosophical explanation for how we came to think “nearly significant” is wrongheaded. I’m going to suggest that it has a lot to do with our continued reliance on a piece of 15th-century technology: the printing press. Continue reading
This semester, I’m coteaching a graduate/advanced-undergraduate level course in biostatistics and experimental design. This is my lecture on how to present statistical results, when writing up a study. It’s a topic I’ve written about before, and what I presented in class draws on several older blog posts here at Scientist Sees Squirrel. However, I thought it would be useful to pull this together into a single (longish) post, with my slides to illustrate it. If you’d like to use any of these slides, here’s the Powerpoint – licensed CC BY-NC 4.0.
(Portuguese translation here, for those who prefer.)
How should you present statistical results, in a scientific paper?
Comic: xkcd #892, by Randall Munroe
For some reason, people seem to love taking shots at null-hypothesis/significance-testing statistics, despite its central place in the logic of scientific inference. This is part of a bigger pattern, I think: it’s fun to be iconoclastic, and the more foundational the icon you’re clasting (yes, I know that’s not really a word), the more fun it is. So the P-value takes more than its share of drubbing, as do decision rules associated with it. The null hypothesis may be the most foundational of all, and sure enough, it also takes abuse.
I hear two complaints about null hypotheses – and I’ve been hearing the same two since I was a grad student. That’s mumble-mumble years listening to the same strange but unkillable misconceptions, and when both popped their heads up again within a week, I gave myself permission to rant about them a little bit. So here goes. Continue reading
(My writing pet peeves – Part 1)
Image: completely fake “data”, but a real 1-way ANOVA; S. Heard.
I read a lot of manuscripts – student papers, theses, journal submissions, and the like. You can’t do that without developing a list of pet peeves about writing, and yes, I’ve got a little list*.
Sitting atop my pet-peeve list these days: test statistics, P-values, and the like reported to ridiculous levels of precision – or, rather, pseudo-precision. I’ve done it in the figure above: F1,42 = 4.716253, P = 0.0355761. I see numbers like these all the time – but, really? Continue reading
Graphic: Parasitoid emergence from aphids on peppers, as a function of soil fertilization. Analysis courtesy of Chandra Moffat (but data revisualized for clarity).
“Every time you say ‘trending towards significance’, a statistician somewhere trips and falls down.” This little joke came to me via Twitter last month. I won’t say who tweeted it, but they aren’t alone: similar swipes are very common. I’ve seen them from reviewers of papers, audiences of conference talks, faculty colleagues in lab meetings, and many others. The butt of the joke is usually someone who executes a statistical test, finds a P value slightly greater than 0.05, and has the temerity to say something about the trend anyway. Sometimes the related sin is declaring a P value much smaller than 0.05 “highly significant”. Either way, it’s a sin of committing statistics with nuance.
Why do people think the joke is funny? Continue reading
(graphic by Chen-Pan Liao via wikimedia.org)
The P-value (and by extension, the entire enterprise of hypothesis-testing in statistics) has been under assault lately. John Ioannadis’ famous “Why most published research findings are false” paper didn’t start the fire, but it threw quite a bit of gasoline on it. David Colquhoun’s recent “An investigation of the false discovery rate and the misinterpretation of P-values” raised the stakes by opening with a widely quoted and dramatic (but also dramatically silly) proclamation that “If you use P=0.05 to suggest that you have made a discovery, you will be wrong at least 30% of the time.”* While I could go on citing examples of the pushback against P, it’s inconceivable that you’ve missed all this, and it’s well summarized by a recent commentary in Nature News. Even the webcomic xkcd has piled on. Continue reading