Some startling data about the scourge of acronyms

There’s a lot to dislike about the way we write scientific papers. They’re often tedious and impenetrable, and they get that way at least in part because we make poor decisions as we write. We overuse big fancy words when short simple ones are available (“utilize”, anyone?), we just can’t let go of our fetish for the passive voice, and we apparently love nothing more than replacing some actual English words with an acronym. And so on. We do all these things, I’m convinced, in large part because we want our papers to sound science-y. We’re used to reading a literature rich in jargon, passive voice, and acronyms, and so either consciously or subconsciously we model our writing on what’s gone before – because that’s what scientific writing sounds like. It’s a trap, of course, but one that’s fiendishly hard to escape from.

It’s easy for me to rant about this stuff (and believe me, I won’t stop), but it’s more compelling when there’s data. A couple of months ago, Adrian Barnett and Zoe Doubleday provided that in spades, with respect to our literature’s scourge of acronyms (OLSOA). They analyzed a corpus of 18 million abstracts from health- and medicine-related journals, published between 1950 and 2019, extracting acronyms and painting a rather disturbing portrait of the way we use them in our writing. I pulled the paper out of my to-read folder last week because I was revising the relevant bits of The Scientist’s Guide to Writing (in preparation for its eventually-forthcoming second edition). You should pull it out of yours, too; but today just a few choice morsels.

  • Their 18 million abstracts contained 1,112,345 different acronyms. (That’s a lower bound: their algorithm misses some peculiar ones, such as acronyms including lower-case letters). Does that seem like a lot? Yep. The English language has somewhere between 350,000 and 1,000,000 words, depending how you count, so the medicine-related fields alone have coined more acronyms than there are words in our language. Including physics, chemistry, ecology, earth sciences, and mathematics would presumably boost the already-disturbing count still, and much, further. [NOTE added later: since some folks seem to love to comment on things without bothering to read further, yes, the paper groups acronynms, initialisms, abbreviations, etc. together. Yes, it explains why it does so. No, none of the points the authors make, or that I make here, hinge on getting these pedantic distinctions right. Geesh.]
  • Our use of acronyms has increased tenfold in 70 years, reaching just over 4 per 100 words in 2019. That’s right – a short-ish abstract of 150 words contains, on average, six Good Lord (GL).
  • We really like three-letter acronyms: they’re the most common length by far. There are 17,576 possible 3-letter acronyms, and, collectively, we’ve used at least 94% of them (that’s a lower bound, as the corpus didn’t include all the sciences).
  • Following directly from the last point, many acronyms have multiple meanings, across fields or even between papers within fields. They have to; there just aren’t enough unique letter combinations to satisfy our craving for them.
  • If you invent a new acronym, the chances are slim that it will catch on. Fully 30% of the million+ acronyms appear only once in the corpus: nobody else, not even the coining authors, ever used it again (in an abstract, at least).

Kind of astonishing, isn’t it? You knew, as I did, that we have an acronym problem (AP). I bet you didn’t know it was quite that bad.

One thing that Barnett and Doubleday didn’t study is the re-use of acronyms, once defined or coined, within the body of the defining paper.* I have a particular bee in my bonnet about that. In manuscripts I read and review, it’s routine to see authors defining an acronym and then using it only once or twice – sometimes, defining an acronym and then never using it again.** Look: unless the acronym you’re using is more familiar to readers than the words it replaces (DNA, SCUBA), then the first time you use it you’re asking for some mental learning-and-decoding effort from the reader. The second time you use it, you’re asking for a little less, and so on.  If you use it often enough, it may eventually save the reader effort and start to pay back those early costs. I don’t know what the breakeven number of uses is, where the reader is overall better off for their work in learning the acronym (it surely varies among acronyms and readers). I’m very confident, though, that it isn’t one, or two, or three!  I’d love to see the frequency distribution for re-uses of acronyms per paper, and I hypothesize that acronyms used once or just a few times in a paper would make up a stupidly large fraction.

There: free research idea (FRI) for you. You’re welcome.

© Stephen Heard  October 6, 2020

Image: A few samples from Barnett and Doubleday’s database. Image © Barnett and Doubleday, CC BY 4.0; original text presumably © the authors and/or journals.

*^They explain that this is because they were working with publicly available data (titles and abstracts), and analyzing paper texts would have restricted them to recent and open-access journals only. I’m not actually sure that’s true (we have libraries), but it’s unfair to criticize an author for not doing all imaginable studies at once. So, let’s just say it’s an obvious avenue for future work.

**^As I did with OLSOA, GL, and AP in this very post. I slay me.

8 thoughts on “Some startling data about the scourge of acronyms

  1. yf

    Thanks for this interesting post. It is truly amazing that we have already run out of available acronyms and recycled many of them. The linked article does not seem to discuss WHY we are so eager to coin new acronyms every time we write. I am guilty of this, as many of us, but not really (or at least not consciously) to sound more “sciency”. The most frequent reason, to me, is simply to meet journals’ requirements in terms of article length ! It most likely reveals that I am a bad writer but often, when it turns out I will be unable to make a manuscript < e.g. 5000 words, I search for multiple instances of a specific term and decides whether I can make it an acronym or not. Lazy behaviour for sure, but I suspect I am not the only one.

    1. ScientistSeesSquirrel Post author

      I will admit to having done this too, earlier in my career. I’ve decided, though, that I can almost always find another way to cut a few words if I have to – a way that’s more reader-friendly than adding a bunch of acronyms. I don’t say that to shame you – I agree with you that it’s a common ‘trick’ and I understand why folks do it!


  2. Markus Eichhorn

    Alas it’s not only in papers. I’ve just received an internal e-mail which uses six acronyms in the opening two sentences, most of which are unfamiliar to me. This renders the whole communication almost pointless unless I take the time to parse these jumbles of letters (and some numbers). Is this just a wider problem of the way we write nowadays that is leaking into the academic literature?


  3. Sherry Marts

    Another source of huge numbers of acronyms: the US Federal Government. They have started to re-use them, too. I spent a brief (and miserable) time working for a consultancy on a government contract. The first time they told me I needed to submit a report to the GPO, my thought was “Why would the Government Printing Office need this?” because the GPO was where you went to obtain reports, not submit them. “No, no, no, send it to the Government Procurement Office!” was the response.

  4. Ken Hughes

    This past week I’ve seen two separate instances of acronyms involving Greek letters. (Admittedly not in peer-reviewed papers, but still scientific writing.) The fact that it wasn’t clear how to pronounce the acronym doesn’t bode well …



