Warning: gets a bit wonkish near the end.
Have you ever noticed that the mayor of a small town is fairly often a bonehead? There’s a simple reason we’d expect that to be true – and that simple reason has implications for academic searches, the traits we analyze in ecology and systematics, and lots of other things, too (please add to my list in the Replies). The simple reason is this: it’s really hard to estimate extremes. It’s also really hard to understand why so many people act as if they’re unaware of this.
Let’s start with those mayors. Imagine that some fraction of a town’s population runs for the office of mayor, and that citizens vote for the most qualified (smartest? most experienced?) candidate. Under this model, the mayor is the best the town has to offer* (or at least the best among those willing to run). The bigger the town, the better that is (and the smaller, the worse). If the Village of West Podunk has a bonehead mayor while the City of Megalopolis does pretty well, we shouldn’t be surprised: expected maxima scale with sample size. That’s the first important property of extremes. There’s a second property that matters here too: the variance around estimates of extremes is very large. That’s why West, Centre, and North Podunks can have atrocious mayors while East Podunk escapes that fate.
If these two properties of extremes aren’t intuitive to you, it may help to see some (simulated) data. I simulated (R code here) random samples of a trait (“mayoral quality”, for now) from a normal distribution with mean 100 and standard deviation 50, with sample sizes ranging from 10 to 10,000. I drew 1000 samples of each size. For each sample, I calculated the sample mean and the sample maximum – if you like, the town’s average citizen and the town’s best-qualified mayor. Here we go:
Start with the bottom envelope, which shows the means and a measure of their among-sample variability (±2 standard deviations). Our estimates are centred on a single underlying “true” value (100). The among-sample variability is pretty small and, for reasonable sample sizes, rapidly gets very small. Means, we see, are easy to estimate. Now look at the upper envelope, which shows the maximum and its among-sample variability. Not so pretty, is it? The maximum changes, and its variability starts off huge and shrinks very, very, slowly**. The mayor of a small town (sampled from the left end of the top envelope) is very often a bonehead; and for all town sizes, there’s lots of variation in the degree of boneheadedness.
But wait. This is a science blog (for some definitions of the term). Let’s move beyond the mayors. This is also why academic searches should generally be external. When an administrator tells you a search (for a new Dean, for example) will be internal, they may say “We are confident that we have excellent candidates right here at Hometown University”. Consider two possibilities. They may not understand the math above; or they may understand it very well, but are hoping that you don’t (because external searches are more difficult and more expensive). If you’re looking for the very best candidate, you want to sample from the largest pool (the right end of the top envelope, not the left). Sure, sometimes the best candidate is already internal; but (except perhaps at the University of Lake Wobegon), usually not.
Even more science-y: what about the traits of organisms that we choose to analyze? We analyze trait data all the time, in ecology (especially in the fashionable sub-sub-discipline of functional trait ecology), in evolution, in biosystematics, in physiology. That’s entirely to be expected, of course – it’s hard to see what we’d have left if we didn’t analyze trait data. But: over the last few years, I’ve reviewed half a dozen manuscripts in which someone analyzed the maximum value of something – for example, the width of a plant’s widest leaf, the length of the longest fish, or the temperature on the warmest day. We should expect these “traits” to behave very poorly (in a statistical sense).
Let’s substitute fish for mayors***. What do we get if we estimate “maximum body length” for a population or a species of fish? We get two problematic things (that map onto the two properties of small-town mayors).
- First, our estimate will be strongly sensitive to our sample size. Actually, it’s worse than that – the actual maximum that we’re trying to estimate will itself be strongly sensitive to population size. You can think of maximum size as an outcome of some kind of behind-the-scenes sampling in which the fish population is sampling from a larger universe of possible developmental outcomes. So there are two opportunities for the maximum to scale with sample size (as in the graph): one when actual fish are sampled (by nature) from theoretical fish, and another when the measured fish are sampled (by us) from the actual fish. In an important sense, “maximum size” as a trait doesn’t really exist at all (except of course in the special case where growth is strongly asymptotic).
- Second, even if there was a maximum value we could usefully estimate, our estimate would be poor. We can estimate the mean size of a fish in a population with good precision for a reasonable sample size (lower envelope on the graph), but we can’t do the same thing for maximum size (upper envelope). (Again, except in certain special cases.)
Given these issues, why do people measure and analyze maxima (and minima)? I’m not sure. One possibility is that humans are intrinsically fascinated by extremes. Another is that in the field, humans are pretty good at finding the largest one of something, but not so good at picking out an average one. A third is that in some circumstances, the extreme might actually matter – perhaps the largest fish dominates behaviourally, or the tallest plant wins competition for light. Here’s the thing, though: maybe you can argue that in your study, the extreme really does matter; but then you have to acknowledge the poor statistical properties of your “trait” estimates, and propose a way that you can mitigate them. The manuscripts I’ve seen have, unanimously, not done that.
So, mayors, searches, and fish (etc.) – united by an understanding that extremes are funky to deal with and hard to estimate. Did I write this entire post just to have something to cite the next time I review a manuscript using extreme-value data? I’ll never tell.
© Stephen Heard (firstname.lastname@example.org) January 3, 2017
*^I am of course aware that elections don’t really work this way. There is ample empirical evidence that citizens don’t always vote for the most qualified person in the field – for example, this (which doesn’t link where you probably think it links) and this (which does). The math sets the ceiling; voter behaviour, unfortunately, sets the floor.
**^The picture I’m painting is for normally-distributed data. Other distributions make estimation harder, both for means and for extremes. Feel free to grab my R code and play around – or you can visit your library and borrow the definitive book on the statistics of extremes.
***^As subjects for this post, I mean. Although I suppose there are times when the literal substitution might not be a bad idea.