From ego naming to cancer screening: type I error and rare events

Image: Unicorn fresco by Domenichino (1581-1641), in the Palazzo Farnese, Rome, via wikimedia.org

Sometimes, thinking about science, I make odd connections.  Often, they seem odd when I first make them, but then I learn something important from them and wonder why I’d never made them before.  A good example cropped up the other day, when I realized that a peculiar feature of the scientific naming of organisms connects, via some simple statistics, to the difficulty of cancer screening, to reproducibility, and to the burden of proof for surprising claims.  Curious?  Here goes.

I’ve been working on a new book, about eponymous Latin (scientific) names for organisms.  “Eponymous” Latin names are those based on the names of people, as in the shrub Berberis darwinii.  That species was named by Sir William Jackson Hooker, in honour of Charles Darwin, and there’s nothing unusual or surprising about that.  But: could Hooker instead have named it Berberis hookeri, in honour of himself?  I call the act of naming a species after yourself “ego naming”, and I had great fun writing a chapter about it.

Ego naming is rare.  In fact, a lot of people believe that it isn’t allowed – but under both the Botanical and Zoological Codes of Nomenclature, it’s perfectly legal.  Even more people believe that ego naming in poor taste, and for that reason, accusations of ego naming get thrown around with a little bit of that delicious disapproval we all feel when we get to point out someone else’s minor social infraction.  I’d made a list of almost two dozen such accusations, all of the form “Commerson’s dolphin, Cephalorhynchus commersonii, was named by Philibert  Commerson after himself”.  Except that Commerson, it turns out, didn’t actually commit the ego-naming sin of which he was accused; and as I dug into more and more case histories, over and over again the accused escaped conviction.  Most supposed cases of ego naming turn out to be based on misunderstandings or errors, or (interestingly) turn out to be inadvertent rather than deliberate on the part of the namer*.  Vanishingly few turn out to be real: almost never does someone actually name a species for themselves.  The false accusations far, far outweigh the true ones.

For a while, I thought this was an interesting pattern, that it might say something profound about the way we enjoy spreading salacious tidbits about supposed wrongdoing**.  But then I realized the ego-naming pattern was just another form of a more general rule:  When events are rare, false alarms will be much more common than true cases.

This has been written about a lot with respect to cancer (and other disease) screening.  Imagine that we screen 1,000,000 people for a rare form of cancer.  Imagine, further, that the true incidence is 5 cases per 100,000 people (0.005%), and that our screening test is a good one, with high power (90%) and a low false-positive rate (1%).  We’ll get a bunch of positive results, of course – but over 99.5% of them will be false alarms (10,000 of them, to be precise, compared to just 45 true cases***).  Screening for rare diseases is hard.

The problem of rare-disease screening has a important implications for health care delivery, medical ethics, and so on, but others have written about them with far more authority than I could.  Instead, I’m interested in generalizing the phenomenon.  I’ve seen the cancer-screening calculation many times; but it took a long while for me to realize my false-accusations-of-ego-naming pattern was exactly the same thing.  Any time (1) there’s a rare event, (2) we look hard for cases of it, and (3) we look using a procedure with a non-zero error rate, false alarms are likely to be more common that real detections.  (We can figure out just how much more common quite easily, via the arithmetic in that last footnote.)

So now I understand that false accusations of ego naming (as with Commerson’s dolphin) outnumber real cases simply because (1) real cases are very rare, (2) a lot of species have been described and a lot of people read that literature, and (3) reading and transcription errors (and other weird circumstances that lead to accusations) are inevitable even though infrequent.

But of course this kind of thing happens all the time, and we should keep this in mind when we see something that surprises us.  The more surprising a real occurrence might be (another way to think about its rarity), the more dominant the false positives.  We should keep this in mind for astonishing claims on social media, for distressing breaking news stories, for accusations of malevolent behaviour among university administrators, for reports of particles moving faster than light; for sightings of UFOs and sea monsters.  (Yes, I’m aware that true cases are rarer for some of these than others.)  It’s also, obviously, relevant to the issue of reproducibility in science.  Our literature isn’t a big pile of facts; it’s full of false positives just like cancer screenings and my list of ego-naming accusations.  That isn’t a horrible thing; it’s just the way inference works.  While I’m not convinced that replicating every experiment we do is the right response to this realization, ignoring it isn’t either.

Extraordinary claims require extraordinary evidence, it’s sometimes said, and that’s true (and rather Bayesian).  And in a sense, a supposed detection of anything rare is an “extraordinary claim”.  Rare things are hard to find, but much easier to think you’ve found.  That’s true for rare cancers, malevolent administrators, faster-than-light particles, and – returning to the realization that sparked this post – people who name species after themselves.

Although even rare things happen.  One day soon you’ll be able to read about Colonel Robert Tytler and his incredible performing ego.  But you’ll have to wait for my new book.

© Stephen Heard  May 8, 2018


*^I’m not going to explain all the ways this happens.  They’re pretty interesting (perhaps you’re surprised by that), but you’ll have to wait for my book.  But in the meantime: here’s a challenge for you, along the lines of “Name the 23 ways in baseball that a batter can advance to first base”.  Use the Replies to suggest scenarios in which we could have a name of the form Genus authorityi Authority (as in Berberis hookeri Hooker), without an act of deliberate ego naming.  (No prizes, though, except my admiration.)

**^Actually, I still think it probably does.  And we definitely do.  But that’s not all it says.

***^In case you’ve never seen the arithmetic worked before:  there will be 0.005% × 1,000,000 = 50 real cases, and since our power is 90%, we’ll get a positive test result for 0.9 ×50 = 45 of them.  Meanwhile, 1,000,000 – 50 = 9,999,950 people don’t have the cancer, but the 1% false positive rate gives 1% × 9,999,950 = 10,000 positive tests (I’m rounding off half a person here).  That gives 10,045 positive tests, of which 0.45% (45) are real, and 99.55% (10,000) are false.

Advertisements

16 thoughts on “From ego naming to cancer screening: type I error and rare events

  1. amlees

    Person naming species fills in form and either fills it in wrong, or it is misread, so that instead of being named after someone else, it is named after the form-filler?

    Like

    Reply
    1. ScientistSeesSquirrel Post author

      Good guess! There’s no “form” involved – it’s simply a matter of publishing the name – but along your lines, sometimes the paper may be jointly authored by Smith and Jones, but the species may be named jonesii and the description specifies that only Smith is the namer. That’s pretty easy to miss, and someone could report the existence of Genus jonesii Smith and Jones, whereas it should really be Genus jonesii Smith. So that’s one!

      Like

      Reply
  2. Catherine Scott

    I’d guess folks naming species after a parent or spouse or other relative with the same last name?

    Like

    Reply
  3. Ian Medeiros

    Medeiros publishes the name Genus smithiae Medeiros without noticing that it is a later homonym of Genus smithiae Jones. At first no one catches the error and there is no replacement name in the same genus, but a little while later Smith realizes that Genus smithiae Medeiros (non Genus smithiae Jones) actually belongs in Neogenus. She publishes the name Neogenus smithiae, but because Genus smithiae Medeiros is an illegitimate name, the correct author citation for Smith’s new name is Neogenus smithiae Smith and not Neogenus smithiae (Medeiros) Smith.

    Like

    Reply
  4. Jeremy Fox

    Great post. It would not have occurred to me that ego naming being rarer in practice than it’s reputed to be is analogous to the more familiar problem with diagnostic screens for rare events.

    Like

    Reply
    1. Jeff Houlahan

      Jeremy and Steve, I’m going to quibble a little bit here – I don’t think this is exactly analogous to the cancer example. The cancer example is a classic because we start with the premise that the disease is rare – the prior is known. To make the connection between self-naming and the cancer example you would have to assume that the prior probability of self-naming was low and that wouldn’t have been an intuitive assumption for me. Once you investigate and realise the prior probability of self-naming is low the connection becomes more obvious. Knowing the prior is the key and most of us (I think) wouldn’t have known the prior.

      Like

      Reply
      1. Jeremy Fox

        Afraid I don’t follow Jeff. Whether or not you realize that instances of X are rare doesn’t change the fact that, when X is rare, most purported detections of X will be false positives. They’re still mostly false positives even if you don’t *realize* they’re false positives (say, because you have a mistaken prior that X is common). But I feel like I must be missing your point…

        Like

        Reply
  5. Jeff Houlahan

    Sorry, I didn’t make my point clearly at all. The analogy absolutely makes sense but I was making a point about who would notice that the analogy made sense and when – only because, Jeremy, you were saying that it never occurred to you that these were analogous and Steve, you were saying it only occurred to you recently. But I don’t think the analogy would be obvious until you knew the prior for ego-naming – as long as you believed ego-naming was common then there was no reason to see these as analogous. Here’s the thought experiment I would use to illustrate this, is getting ticketed for speeding analogous to the ego-naming or disease diagnosis stories? It either is or isn’t but I don’t think we can answer that question without knowing the prior probability that drivers are speeding and the measurement error on the radar gun. And having written these words I’m not sure I’ve ever made a more trivial or esoteric point…and that’s saying something.

    Liked by 1 person

    Reply
    1. ScientistSeesSquirrel Post author

      Ah, I understand your point now. That really was something I was trying to get at, perhaps unclearly – when we don’t know the actual probability, we should be wary because it’s hard to know what fraction of detections will be false positives. However, when detections are uncommon, that fraction will often be “most”. To use your thought experiment:if you get six speeding tickets a week, false positives are unlikely to account for a significant fraction of them (and also, please tell me where you live so I can steer clear…) If you get one speeding ticket every 10 years, it’s plausible that it’s a false positive. (I’m assuming in both cases that the false positive rate for radar gun use is not zero, but not outrageous). So even without knowing the true incidence, we can get some qualitative/intuitive feel from the detection rate about whether we should be cautious about false positives. With ego naming, I should have made the connection sooner on those grounds – a couple of dozen accusations out of tens of thousands of eponymous names!

      Like

      Reply
  6. Pingback: Friday links: the taxonomic sin that nobody ever actually commits, confidence vs. expertise, and more | Dynamic Ecology

Comment on this post:

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s