Warning: wonkish. Also long (but there’s a handy jump).
Over the course of a career, you become accustomed to reviewers raising strange objections to your work. As sample size builds, though, a few strange objections come up repeatedly – and that’s interesting. Today: the bizarre notion that one shouldn’t do significance testing with simulation data. Continue reading
Image: Skinny-leg jeans. Not my legs. Or my jeans. © Claude Truong-Ngoc CC BY-SA 3.0, via wikimedia.org
I went shopping for jeans last week, and came home frustrated. (As usual, yes, I’m eventually heading somewhere.) I have calves of considerable circumference, and the fashion in men’s jeans now seems to be for a very narrow-cut leg. I took pair after pair into the fitting room, only to discover I couldn’t even force my leg through the available hole. I know, hold the presses – I’m old and I don’t like today’s fashion; and while we’re at it, all you kids get off my lawn!
But from my (admittedly weird) utilitarian point of view, I just don’t understand skinny-leg jeans. Here’s why. If you make a pair of skinny-leg jeans, they can be used by a skinny-leg person, but not – not even a little bit – by a non-skinny-leg person. If you make a pair of wide-leg jeans, they accommodate both. There’s a fundamental asymmetry in usefulness that makes it seem obvious, to me, how jeans ought to be sewn.
The same asymmetry is why I teach students to report exact P-values, not just “P<0.05” or “P>0.05”.* Continue reading
Image: William Caxton showing his printing press to King Edward IV and Queen Elizabeth (public domain)
It’s a phrase that gets no respect: “nearly significant”. Horrified tweets, tittering, and all the rest – a remarkably large number of people are convinced that when someone finds P = 0.06 and utters the phrase “nearly significant”, it betrays that person’s complete lack of statistical knowledge. Or maybe of ethics. It’s not true, of course. It’s a perfectly reasonable philosophy to interpret P-values as continuous metrics of evidence* rather than as lines in the sand that are either crossed or not. But today I’m not concerned with the philosophical justification for the two interpretations of P values – if you want more about that, there’s my older post, or for a broader and much more authoritative treatment, there’s Deborah Mayo’s recent book (well worth reading for this and other reasons). Instead, I’m going to offer a non-philosophical explanation for how we came to think “nearly significant” is wrongheaded. I’m going to suggest that it has a lot to do with our continued reliance on a piece of 15th-century technology: the printing press. Continue reading
Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars, by Deborah G. Mayo. Cambridge University Press, 2018.
If there’s one thing we can all agree on about statistics, it’s that there are very few things we all agree on about statistics. The “statistics wars” that Deborah Mayo would like to help us get beyond have been with us for a long time; in fact, the battlefield and the armies shift but they’ve been raging from the very beginning. Is inference about confidence in a single result or about long-term error rates? Is the P-value essential to scientific inference or a disastrous red herring holding science back? Does model selection do something fundamentally different from null-hypothesis significance testing (NHST), and if so, what? If we use NHST, is the phrase “nearly significant” evidence of sophisticated statistical philosophy or evil wishful thinking? Is Bayesian inference irredeemably subjective or the only way to convert data into evidence? These issues and more seem to generate remarkable amounts of heat – sometimes (as with Basic and Applied Social Psychology’s banning of the P-value) enough heat to seem like scorched-earth warfare*. Continue reading
This semester, I’m coteaching a graduate/advanced-undergraduate level course in biostatistics and experimental design. This is my lecture on how to present statistical results, when writing up a study. It’s a topic I’ve written about before, and what I presented in class draws on several older blog posts here at Scientist Sees Squirrel. However, I thought it would be useful to pull this together into a single (longish) post, with my slides to illustrate it. If you’d like to use any of these slides, here’s the Powerpoint – licensed CC BY-NC 4.0.
(Portuguese translation here, for those who prefer.)
How should you present statistical results, in a scientific paper?
Image: Unicorn fresco by Domenichino (1581-1641), in the Palazzo Farnese, Rome, via wikimedia.org
Sometimes, thinking about science, I make odd connections. Often, they seem odd when I first make them, but then I learn something important from them and wonder why I’d never made them before. A good example cropped up the other day, when I realized that a peculiar feature of the scientific naming of organisms connects, via some simple statistics, to the difficulty of cancer screening, to reproducibility, and to the burden of proof for surprising claims. Curious? Here goes. Continue reading
Photo: Lazy red panda CC 0 via pxhere.com
I’ve just published a paper that had some trouble getting through peer review. Nothing terribly unusual about that, of course, and the paper is better for its birthing pains. But one reviewer comment (made independently, actually, by several different reviewers) really bugged me. It revealed some fuzzy thinking that’s all too common amongst ecologists, having to do with the value of quick-and-dirty methods. Quick-and-dirty methods deserve more respect. I’ll explain using my particular paper as an example, first, and then provide a general analysis. Continue reading
Every now and again, you see a critique of a manuscript that brings you up short and makes you go “Huh”.
A student of mine defended her thesis a while ago, and one of her examiners commented on one of her chapters that “the Results section is too short”*. Huh, I said. Huh.
I’m quite used to seeing manuscripts that are too long. Occasionally, I see a manuscript that’s too short. But this complaint was more specific: that the Results section in particular was too short. I’d never heard that one, and I just couldn’t make sense of it. Or at least, not until I realized that it fits in with another phenomenon that I see and hear a lot: the suggestion that nobody should ever, ever do their statistics in Excel. Continue reading
Image: This is what 1300 g (2.8 lb) of basil looks like.
Yesterday (as I write) I bought some basil at my local farmer’s market. Quite a lot of basil, actually – almost 3 pounds of it – because it was my annual pesto-making day*. My favourite vendor sells basil by the stem (at 50¢ each), and I started pulling stems from a large tub. Some stems were quite small, and some were huge, with at least a five-fold difference in size between smallest and largest (and no, I didn’t get to just pick out the huge ones). So how many stems did I need? Or to put it the other way around, given that I bought 49 stems, how many batches of pesto would I be making, and how many cups of walnuts would I need?
My undergrad students – like a lot of biology students – don’t like statistics. Continue reading
Photo: Two giraffes by Vera Kratochvil, released to public domain, via publicdomainpictures.net. Two giraffes are definitely better than one.
Ecologists are perennially angst-ridden about sample size. A lot of our work is logistically difficult, involves observations on large spatial or temporal scales, or involves rare species or unique geographic features. And yet we know that replication is important, and we bend over backwards to achieve it.
Sometimes, I think, too far backward, and this can result in wasted effort. Continue reading