Image: Unicorn fresco by Domenichino (1581-1641), in the Palazzo Farnese, Rome, via wikimedia.org
Sometimes, thinking about science, I make odd connections. Often, they seem odd when I first make them, but then I learn something important from them and wonder why I’d never made them before. A good example cropped up the other day, when I realized that a peculiar feature of the scientific naming of organisms connects, via some simple statistics, to the difficulty of cancer screening, to reproducibility, and to the burden of proof for surprising claims. Curious? Here goes.
I’ve been working on a new book, about eponymous Latin (scientific) names for organisms. “Eponymous” Latin names are those based on the names of people, as in the shrub Berberis darwinii. That species was named by Sir William Jackson Hooker, in honour of Charles Darwin, and there’s nothing unusual or surprising about that. But: could Hooker instead have named it Berberis hookeri, in honour of himself? I call the act of naming a species after yourself “ego naming”, and I had great fun writing a chapter about it.
Ego naming is rare. In fact, a lot of people believe that it isn’t allowed – but under both the Botanical and Zoological Codes of Nomenclature, it’s perfectly legal. Even more people believe that ego naming in poor taste, and for that reason, accusations of ego naming get thrown around with a little bit of that delicious disapproval we all feel when we get to point out someone else’s minor social infraction. I’d made a list of almost two dozen such accusations, all of the form “Commerson’s dolphin, Cephalorhynchus commersonii, was named by Philibert Commerson after himself”. Except that Commerson, it turns out, didn’t actually commit the ego-naming sin of which he was accused; and as I dug into more and more case histories, over and over again the accused escaped conviction. Most supposed cases of ego naming turn out to be based on misunderstandings or errors, or (interestingly) turn out to be inadvertent rather than deliberate on the part of the namer*. Vanishingly few turn out to be real: almost never does someone actually name a species for themselves. The false accusations far, far outweigh the true ones.
For a while, I thought this was an interesting pattern, that it might say something profound about the way we enjoy spreading salacious tidbits about supposed wrongdoing**. But then I realized the ego-naming pattern was just another form of a more general rule: When events are rare, false alarms will be much more common than true cases.
This has been written about a lot with respect to cancer (and other disease) screening. Imagine that we screen 1,000,000 people for a rare form of cancer. Imagine, further, that the true incidence is 5 cases per 100,000 people (0.005%), and that our screening test is a good one, with high power (90%) and a low false-positive rate (1%). We’ll get a bunch of positive results, of course – but over 99.5% of them will be false alarms (10,000 of them, to be precise, compared to just 45 true cases***). Screening for rare diseases is hard.
The problem of rare-disease screening has a important implications for health care delivery, medical ethics, and so on, but others have written about them with far more authority than I could. Instead, I’m interested in generalizing the phenomenon. I’ve seen the cancer-screening calculation many times; but it took a long while for me to realize my false-accusations-of-ego-naming pattern was exactly the same thing. Any time (1) there’s a rare event, (2) we look hard for cases of it, and (3) we look using a procedure with a non-zero error rate, false alarms are likely to be more common that real detections. (We can figure out just how much more common quite easily, via the arithmetic in that last footnote.)
So now I understand that false accusations of ego naming (as with Commerson’s dolphin) outnumber real cases simply because (1) real cases are very rare, (2) a lot of species have been described and a lot of people read that literature, and (3) reading and transcription errors (and other weird circumstances that lead to accusations) are inevitable even though infrequent.
But of course this kind of thing happens all the time, and we should keep this in mind when we see something that surprises us. The more surprising a real occurrence might be (another way to think about its rarity), the more dominant the false positives. We should keep this in mind for astonishing claims on social media, for distressing breaking news stories, for accusations of malevolent behaviour among university administrators, for reports of particles moving faster than light; for sightings of UFOs and sea monsters. (Yes, I’m aware that true cases are rarer for some of these than others.) It’s also, obviously, relevant to the issue of reproducibility in science. Our literature isn’t a big pile of facts; it’s full of false positives just like cancer screenings and my list of ego-naming accusations. That isn’t a horrible thing; it’s just the way inference works. While I’m not convinced that replicating every experiment we do is the right response to this realization, ignoring it isn’t either.
Extraordinary claims require extraordinary evidence, it’s sometimes said, and that’s true (and rather Bayesian). And in a sense, a supposed detection of anything rare is an “extraordinary claim”. Rare things are hard to find, but much easier to think you’ve found. That’s true for rare cancers, malevolent administrators, faster-than-light particles, and – returning to the realization that sparked this post – people who name species after themselves.
Although even rare things happen. One day soon you’ll be able to read about Colonel Robert Tytler and his incredible performing ego. But you’ll have to wait for my new book. [Update: wait no more – it’s out!]
© Stephen Heard May 8, 2018
*^I’m not going to explain all the ways this happens. They’re pretty interesting (perhaps you’re surprised by that), but you’ll have to wait for my book. But in the meantime: here’s a challenge for you, along the lines of “Name the 23 ways in baseball that a batter can advance to first base”. Use the Replies to suggest scenarios in which we could have a name of the form Genus authorityi Authority (as in Berberis hookeri Hooker), without an act of deliberate ego naming. (No prizes, though, except my admiration.)
***^In case you’ve never seen the arithmetic worked before: there will be 0.005% × 1,000,000 = 50 real cases, and since our power is 90%, we’ll get a positive test result for 0.9 ×50 = 45 of them. Meanwhile, 1,000,000 – 50 = 9,999,950 people don’t have the cancer, but the 1% false positive rate gives 1% × 9,999,950 = 10,000 positive tests (I’m rounding off half a person here). That gives 10,045 positive tests, of which 0.45% (45) are real, and 99.55% (10,000) are false.