*Warning: gets a bit wonkish near the end.*

Have you ever noticed that the mayor of a small town is fairly often a bonehead? There’s a simple reason we’d expect that to be true – and that simple reason has implications for academic searches, the traits we analyze in ecology and systematics, and lots of other things, too (please add to my list in the Replies). The simple reason is this: it’s really hard to estimate extremes. It’s also really hard to understand why so many people act as if they’re unaware of this.

Let’s start with those mayors. Imagine that some fraction of a town’s population runs for the office of mayor, and that citizens vote for the most qualified (smartest? most experienced?) candidate. Under this model, the mayor is the best the town has to offer* (or at least the best among those willing to run). The bigger the town, the better that is (and the smaller, the worse). If the Village of West Podunk has a bonehead mayor while the City of Megalopolis does pretty well, we shouldn’t be surprised: expected maxima scale with sample size. That’s the first important property of extremes. There’s a second property that matters here too: the variance around estimates of extremes is very large. That’s why West, Centre, and North Podunks can have atrocious mayors while East Podunk escapes that fate.

If these two properties of extremes aren’t intuitive to you, it may help to see some (simulated) data. I simulated (R code here) random samples of a trait (“mayoral quality”, for now) from a normal distribution with mean 100 and standard deviation 50, with sample sizes ranging from 10 to 10,000. I drew 1000 samples of each size. For each sample, I calculated the sample mean and the sample maximum – if you like, the town’s average citizen and the town’s best-qualified mayor. Here we go:

Start with the bottom envelope, which shows the means and a measure of their among-sample variability (±2 standard deviations). Our estimates are centred on a single underlying “true” value (100). The among-sample variability is pretty small and, for reasonable sample sizes, rapidly gets *very* small. Means, we see, are easy to estimate. Now look at the upper envelope, which shows the maximum and *its* among-sample variability. Not so pretty, is it? The maximum changes, and its variability starts off huge and shrinks very, very, slowly**. The mayor of a small town (sampled from the left end of the top envelope) is very often a bonehead; and for all town sizes, there’s lots of variation in the degree of boneheadedness.

But wait. This is a science blog (for some definitions of the term). Let’s move beyond the mayors. This is also why academic searches should generally be external. When an administrator tells you a search (for a new Dean, for example) will be internal, they may say “We are confident that we have excellent candidates right here at Hometown University”. Consider two possibilities. They may not understand the math above; or they may understand it very well, but are hoping that *you* don’t (because external searches are more difficult and more expensive). If you’re looking for the very best candidate, you want to sample from the largest pool (the right end of the top envelope, not the left). Sure, sometimes the best candidate is already internal; but (except perhaps at the University of Lake Wobegon), usually not.

Even more science-y: what about the traits of organisms that we choose to analyze? We analyze trait data all the time, in ecology (especially in the fashionable sub-sub-discipline of functional trait ecology), in evolution, in biosystematics, in physiology. That’s entirely to be expected, of course – it’s hard to see what we’d have left if we *didn’t* analyze trait data. But: over the last few years, I’ve reviewed half a dozen manuscripts in which someone analyzed the *maximum* value of something – for example, the width of a plant’s widest leaf, the length of the longest fish, or the temperature on the warmest day. **We should expect these “traits” to behave very poorly** (in a statistical sense).

Let’s substitute fish for mayors***. What do we get if we estimate “maximum body length” for a population or a species of fish? We get two problematic things (that map onto the two properties of small-town mayors).

- First, our estimate will be strongly sensitive to our sample size. Actually, it’s worse than that – the actual maximum that we’re trying to estimate will itself be strongly sensitive to population size. You can think of maximum size as an outcome of some kind of behind-the-scenes sampling in which the fish population is sampling from a larger universe of possible developmental outcomes. So there are two opportunities for the maximum to scale with sample size (as in the graph): one when actual fish are sampled (by nature) from theoretical fish, and another when the measured fish are sampled (by us) from the actual fish. In an important sense, “maximum size” as a trait doesn’t really exist at all (except of course in the special case where growth is strongly asymptotic).
- Second, even if there was a maximum value we could usefully estimate, our estimate would be poor. We can estimate the mean size of a fish in a population with good precision for a reasonable sample size (lower envelope on the graph), but we can’t do the same thing for maximum size (upper envelope). (Again, except in certain special cases.)

Given these issues, why *do* people measure and analyze maxima (and minima)? I’m not sure. One possibility is that humans are intrinsically fascinated by extremes. Another is that in the field, humans are pretty good at finding the largest one of something, but not so good at picking out an average one. A third is that in some circumstances, the extreme might actually matter – perhaps the largest fish dominates behaviourally, or the tallest plant wins competition for light. Here’s the thing, though: maybe you can argue that in *your* study, the extreme really *does* matter; but then you have to acknowledge the poor statistical properties of your “trait” estimates, and propose a way that you can mitigate them. The manuscripts I’ve seen have, unanimously, not done that.

So, mayors, searches, and fish (etc.) – united by an understanding that extremes are funky to deal with and hard to estimate. Did I write this entire post just to have something to cite the *next* time I review a manuscript using extreme-value data? I’ll never tell.

*© Stephen Heard (*sheard@unb.ca*) January 3, 2017*

*^I am of course aware that elections don’t really work this way. There is ample empirical evidence that citizens don’t always vote for the most qualified person in the field – for example, this (which doesn’t link where you probably think it links) and this (which does). The math sets the ceiling; voter behaviour, unfortunately, sets the floor.

**^The picture I’m painting is for normally-distributed data. Other distributions make estimation harder, both for means and for extremes. Feel free to grab my R code and play around – or you can visit your library and borrow the definitive book on the statistics of extremes.

***^As subjects for this post, I mean. Although I suppose there are times when the literal substitution might not be a bad idea.

Brian McGillNice introduction! One of my pet topics as well. In general people don’t seem to realize that the sampling behavior of different moments behave differently. It takes a larger sample to get a handle on the variance of something than the mean. And a still larger sample to get a handle on skewness (something that more and more people are interested in). Yet very few people think that through. And as you point out extremes are even worse. I also always use the maximum as an example of something you CANNOT bootstrap to estimate its properties.

On the other hand, one nice thing about extreme events is that regardless of the underlying distribution (normal, gamma, Poisson, and etc) extrema statistics (like the maximum) converge to one of three well defined distributions whose properties can be dealt with analytically quite well.

There are two nice papers in Ecology on extreme events statistics if people want a more technical follow on:

Statistics of extremes: Modeling ecological disturbances RW Katz, GS Brush, MB Parlange – Ecology, 2005 http://onlinelibrary.wiley.com/doi/10.1890/04-0606/full

and the one that introduced me to the topic:

The largest, smallest, highest, lowest, longest, and shortest: extremes in ecology SD Gaines, MW Denny – Ecology, 1993 http://onlinelibrary.wiley.com/doi/10.2307/1939926/full

LikeLike

ScientistSeesSquirrelPost authorThanks, Brian, especially for those links – it probably should have occurred to me that there would be ecology papers I could cite, not just the classic stats book.

It’s funny, when I finished writing that post, I said to myself “huh – I just wrote a Brian McGill post (except shorter and not as good)”. So it’s great fun to see you here as the first commenter!

LikeLike

GregorI was also just about to point towards Denny. In his recent book there are more than 500 pages full of why extreme events are most important to understand mechanisms in ecology and evolution. see http://press.princeton.edu/titles/10641.html

thanks for the nice post, Stephen!

(BTW: I did a book review on that for BaAE last year which the interested reader will find here http://www.sciencedirect.com/science/article/pii/S1439179116300986)

LikeLike

angela molesGreat post. I feel like arguing though…I use maxima rather than means for plant height – not because I have any confidence that I actually have the actual maximum height recorded for each species at each site (it’s amazing how often trees max out at exactly 10, 20 or 30m ht), but because the mean height of a tree species that can reach ~30m is probably something ridiculous like 5cm (on account of there being so many more seedlings in the population than mature plants) [surely something similar happens for most taxa]. So, both mean and maximum stink a bit for measuring organism size, but I reckon maximum is less bad.

LikeLike

ScientistSeesSquirrelPost authorArguments always welcome! I take your point about the mean not being particularly useful given the skewed height distribution for your trees. But it sounds to me like you might want median height of your trees, rather than maximum; medians are much better behaved. Or even some kind of percentile height (even a 75th percentile is much, much easier to estimate than a maximum). Unless, of course, you argue that it’s the maximum height that matters (e.g., my suggestion that the tallest individual might be monopolizing light). Cheers!

LikeLike

will petryHow much do you think the “fascination with extremes” is the siren song of easier or faster sampling? That is, measuring extremes can dramatically reduce the amount of measurements that are needed, freeing up resources to measure other things or in other places. This false efficiency can arise intrinsically, for example a phenology study requires far fewer field days if only data on ‘firsts’ are collected rather than constructing a phenological curve from all individuals. Alternatively the false efficiency can result from hierarchical measurements with crude and fine tools. For example, most individuals can be ruled out as contenders for biggest individual with a quick visual scan, leaving only a handful to be measured carefully with the tape, scale, or calipers. I wonder if the focus on extremes is less prevalent in analyses of traits for which our senses are ill equipped to quickly find the extreme in a crowd of individuals. How often do you see a plant species or population characterized by the highest or lowest leaf C:N ratio?

I’m always on the lookout for accessible examples that are conduits to improved statistical intuition, so thank you for another great post.

LikeLike

ScientistSeesSquirrelPost authorYes, I expect you are right – this is partly what I was trying to get at with the suggestion that humans like extremes because we’re good at spotting them. But your C:N ratio example is a great one – here we escape the siren song because there’s no short cut on that one. Nice idea. Glad you enjoyed the post.

LikeLike

Pavel DodonovI ran some analysis on some data (the manuscript was rejected a couple of weeks ago for not being anything new) and was looking both mean and maximum values. Your post made me think that the maximum values are probably not reliable (sample size of 30), so I’ll drop them in the next version. So thank you for this post! 🙂

LikeLiked by 1 person