*(My writing pet peeves – Part 1)*

*Image: completely fake “data”, but a real 1-way ANOVA; S. Heard.*

I read a lot of manuscripts – student papers, theses, journal submissions, and the like. You can’t do that without developing a list of pet peeves about writing, and yes, I’ve got a little list*.

Sitting atop my pet-peeve list these days: test statistics, *P*-values, and the like reported to ridiculous levels of precision – or, rather, pseudo-precision. I’ve done it in the figure above: F_{1,42} = 4.716253, *P* = 0.0355761. I see numbers like these all the time – but, really?

I’m sure this happens because stats packages report all these digits, and they’re easy to cut-and-paste. It shouldn’t happen, though, because few of those digits are significant digits**. What’s odd is that most folks who commit this particular sin know about significant digits, and would never report the mass of a butternut seed as 53.835766 g. Statistics are no different.

So how many significant digits should you report for a test statistic or a *P*-value (or anything else)? There are two considerations, which I’ll call *data-significant digits* and *reader-significant digits*. The number of data-significant digits sets the precision that you *can* report, while the number of reader-significant digits sets the precision that you *should* report. I’ll explain (but rest assured, I’ll end with a simple rule of thumb).

By *data-significant digits* I mean the usual consideration about precision in numbers. In general, a digit is data-significant if you would expect to repeat its value upon doing the measurement again. That’s easy to understand for a single seed placed on a balance: take the seed off, put it back on, and compare the results. Most of us know how precision propagates through calculations, too, or have ways of approximating this (for instance, we know there’s no point reporting a mean to precision swamped by its standard error). Applying this logic to statistics needs a little more thought. If a test statistic or P-value is determined by resampling or randomization (bootstrapping, permutation tests, and other Monte Carlo techniques), then it’s easy to see that the significant digits are only those that are consistent across repetitions of the entire test. Imagine that you conduct a randomization ANOVA with 1,000 permutations of the data, and get *P* = 0.0781. You do the whole thing over again, with *another* 1,000 permutations, and get *P* = 0.0783. Two digits at most are data-significant: the 7 and the 8. But even when randomization isn’t involved in the test itself, randomization is always involved in the sampling of individuals from the population (and perhaps in allocation of individuals to treatments), and at least conceptually, the same possibility exists of repeating the whole study and observing the precision of its statistical results***. In this context, sometimes it’s obvious how to assess precision of a statistic: we’re perfectly familiar, for instance, with a parameter like a slope coming with a confidence interval. Other times it’s much less obvious; for instance, we rarely apply precision thinking to the *P*-value. This seems awkward; but fortunately, the second consideration can save our bacon: the number of reader-significant digits.

A *reader-significant digit* is one that can help the reader understand the story your paper is telling. Perhaps you actually could measure seed masses to the nearest μg, making a butternut seed mass of 53.835766 g respect the rules of data-significant digits. Can you imagine a circumstance in which a reader would learn something useful from those trailing sixes? Not likely. The same is true for statistics. Take the *P*-value from my seed-mass ANOVA: *P* = 0.0355761. Is there information in the trailing 1? No; even if sample size were sufficient to make that digit data-significant, there’s no way it’s ever going to matter to a reader. In my seed-mass ANOVA, reporting “F_{1,42} = 4.7, *P* = 0.036” carries just as much useful information as “F_{1,42} = 4.716253, *P* = 0.0355761”, and does so more clearly, with less mental demand on the reader.

Note, by the way, that one could be tempted by this reader-information logic to report a value of *P* = 0.0499999999 because any rounding moves us past the line-in-the-sand *P *= 0.05. This temptation is actually one strong argument for continualist, rather than absolutist, inference from *P*-values. Every user of inferential statistics should understand (and hold a well-reasoned position on) the continualist-absolutist distinction, but it’s widely misunderstood, or at least disrespected. I’ve written about it here.

If all this makes reporting statistics seem complex, it really needn’t be. As a rule of thumb, I think that if our experiments/observations are half-way competently designed, then it will be reader-significant digits, not data-significant ones, that limit usefulness and determine what we should report. And if that’s true, then at most two digits should do for *P*-values, and two or three for most test statistics.

So, F_{1,42} = 4.7, *P* = 0.036. Done – and because the whole world will (of course) read this and do what I say, that’s my #1 pet peeve sorted. A good morning’s work!

*© Stephen Heard (*sheard@unb.ca*) April 18, 2016*

*This post is based in part on material from *The Scientist’s Guide to Writing*, my guidebook for scientific writers. You can learn more about it **here**.*

*^Well, it’s not really that little, but I didn’t want to let the *Mikado *reference go by.

**^Note that I’m not talking about leading zeroes. “P = 0.00000004” has one significant digit, not eight, and reporting that is just fine. In fact, I’d rather see that than “*P* <0.05” or even “*P* <0.0001” (both common), because those leading zeroes contain information useful in meta-analysis. Instead, I’m talking about “P = 0.0355761”, which pretends to have 6 significant digits, but almost certainly doesn’t. You’re doubtless comfortable with significant digits, but if you need a formal refresher, here’s one. In this post, I treat them a bit less formally and more intuitively.

***^Of course, how many digits change depends on sample size (and, for randomization tests, the number of randomizations). If you sample every individual in the population, there is no longer any uncertainty. Raise your hand if you’ve ever done that. Thought so.

Kevin WrightTwo relevant comments by others…

Terry Therneau, S-news mailing list, 8 Nov 2000:

“Given what I know about data, models, and assumptions, I find more than 2 significant digits of printout for a p-value to be indefensible. (I actually think 1 digit is about the max).”

Boos & Stefanski (2011). “P-value precision and reproducibility”, The American Statistician, 65, 213—221.

“We show that -log10 (p-value) standard deviations are such that for a wide range of observed signiﬁcance levels, only the MAGNITUDE of -log10 (p-value) is reliably determined. That is, writing the p-value as x · 10-k, where 1 ≤ x ≤ 9 and k = 1, 2, 3, … is the magnitude so that -log10(p-value) = -log10(x) + k, the standard deviation of -log10(p-value) is so large relative to its value that only the magnitude k is reliably determined as a measure of evidence.”

LikeLiked by 1 person

sleather2012Great post – I spend a lot o my life telling students the same thing

LikeLiked by 1 person

Pingback: Is this blog a “science blog”? If not, what is it? | Scientist Sees Squirrel

Pingback: One figure at a time, please | Scientist Sees Squirrel

Pingback: There’s no such thing as “an unrelated genus” | Scientist Sees Squirrel