There’s a lot to dislike about the way we write scientific papers. They’re often tedious and impenetrable, and they get that way at least in part because we make poor decisions as we write. We overuse big fancy words when short simple ones are available (“utilize”, anyone?), we just can’t let go of our fetish for the passive voice, and we apparently love nothing more than replacing some actual English words with an acronym. And so on. We do all these things, I’m convinced, in large part because we want our papers to sound science-y. We’re used to reading a literature rich in jargon, passive voice, and acronyms, and so either consciously or subconsciously we model our writing on what’s gone before – because that’s what scientific writing sounds like. It’s a trap, of course, but one that’s fiendishly hard to escape from.
It’s easy for me to rant about this stuff (and believe me, I won’t stop), but it’s more compelling when there’s data. A couple of months ago, Adrian Barnett and Zoe Doubleday provided that in spades, with respect to our literature’s scourge of acronyms (OLSOA). They analyzed a corpus of 18 million abstracts from health- and medicine-related journals, published between 1950 and 2019, extracting acronyms and painting a rather disturbing portrait of the way we use them in our writing. I pulled the paper out of my to-read folder last week because I was revising the relevant bits of The Scientist’s Guide to Writing (in preparation for its eventually-forthcoming second edition). You should pull it out of yours, too; but today just a few choice morsels.
- Their 18 million abstracts contained 1,112,345 different acronyms. (That’s a lower bound: their algorithm misses some peculiar ones, such as acronyms including lower-case letters). Does that seem like a lot? Yep. The English language has somewhere between 350,000 and 1,000,000 words, depending how you count, so the medicine-related fields alone have coined more acronyms than there are words in our language. Including physics, chemistry, ecology, earth sciences, and mathematics would presumably boost the already-disturbing count still, and much, further. [NOTE added later: since some folks seem to love to comment on things without bothering to read further, yes, the paper groups acronynms, initialisms, abbreviations, etc. together. Yes, it explains why it does so. No, none of the points the authors make, or that I make here, hinge on getting these pedantic distinctions right. Geesh.]
- Our use of acronyms has increased tenfold in 70 years, reaching just over 4 per 100 words in 2019. That’s right – a short-ish abstract of 150 words contains, on average, six Good Lord (GL).
- We really like three-letter acronyms: they’re the most common length by far. There are 17,576 possible 3-letter acronyms, and, collectively, we’ve used at least 94% of them (that’s a lower bound, as the corpus didn’t include all the sciences).
- Following directly from the last point, many acronyms have multiple meanings, across fields or even between papers within fields. They have to; there just aren’t enough unique letter combinations to satisfy our craving for them.
- If you invent a new acronym, the chances are slim that it will catch on. Fully 30% of the million+ acronyms appear only once in the corpus: nobody else, not even the coining authors, ever used it again (in an abstract, at least).
Kind of astonishing, isn’t it? You knew, as I did, that we have an acronym problem (AP). I bet you didn’t know it was quite that bad.
One thing that Barnett and Doubleday didn’t study is the re-use of acronyms, once defined or coined, within the body of the defining paper.* I have a particular bee in my bonnet about that. In manuscripts I read and review, it’s routine to see authors defining an acronym and then using it only once or twice – sometimes, defining an acronym and then never using it again.** Look: unless the acronym you’re using is more familiar to readers than the words it replaces (DNA, SCUBA), then the first time you use it you’re asking for some mental learning-and-decoding effort from the reader. The second time you use it, you’re asking for a little less, and so on. If you use it often enough, it may eventually save the reader effort and start to pay back those early costs. I don’t know what the breakeven number of uses is, where the reader is overall better off for their work in learning the acronym (it surely varies among acronyms and readers). I’m very confident, though, that it isn’t one, or two, or three! I’d love to see the frequency distribution for re-uses of acronyms per paper, and I hypothesize that acronyms used once or just a few times in a paper would make up a stupidly large fraction.
There: free research idea (FRI) for you. You’re welcome.
© Stephen Heard October 6, 2020
Image: A few samples from Barnett and Doubleday’s database. Image © Barnett and Doubleday, CC BY 4.0; original text presumably © the authors and/or journals.
*^They explain that this is because they were working with publicly available data (titles and abstracts), and analyzing paper texts would have restricted them to recent and open-access journals only. I’m not actually sure that’s true (we have libraries), but it’s unfair to criticize an author for not doing all imaginable studies at once. So, let’s just say it’s an obvious avenue for future work.
**^As I did with OLSOA, GL, and AP in this very post. I slay me.