Category Archives: data analysis

Why most studied populations should decline

Figure: Time series for two populations, each fluctuating in size. At time zero, I start a long-term study, and can choose either of the two populations (open circles). At some other time, I recensus (closed circles).  Red arrows show net population change.

On any given day it’s hard not to notice another headline about a population in decline.  Amphibians are in decline, songbirds are in decline, bumblebees are in decline, fish stocks are in decline.  Nature is under relentless human pressure, both direct and indirect, and before I proceed to make my point today, I need to be very clear that this pressure is real and severe and I don’t doubt for a moment that it’s driving down population sizes of many, many species.

But there’s a very simple but pervasive statistical problem with the data behind population declines. Continue reading

Ternary plots and the Grand Unified Theory of Potato Chips

Images: Soil ternary plot, Mike Norton via wikimedia.org, CC BY-SA 3.0.  Chip ternary plots, S. Heard.

I’ve always been mystified by ternary plots – you know, those cool looking triangular ones.  I shouldn’t be; they aren’t really that complicated. But while Cartesian plots (in two dimensions or three) speak to me easily and clearly, ternary plots remain stubbornly silent.

I’ve survived this cognitive failing for nearly 30 years by deploying a strategy based entirely on avoidance.  Ternary plots just aren’t used that much, in my field, except with a couple of specific kinds of data that are conveniently treated as mixes of three components – soil composition (sand, silt, and clay; above) being perhaps the most common. But my avoidance strategy came crashing down around me last semester, when I taught part of second-year Ecology as a sabbatical fill-in.  There is was, right there in the 4th week’s lecture outline: soils.  Field capacity, available water capacity, wilting point, soil horizons, and – oh, the humanity – that conventional ternary plot of sand, silt, and clay.  I had to teach it – and I didn’t understand it.

Something had to give, of course, and I knew it had to be me. Continue reading

Good uses for fake data (part 2)

In Good uses for fake data (part 1), I expounded on the virtues of fake – or “toy” – datasets for understanding statistical analyses. But that’s not the only good use for fake data. Fake data (this time, maybe a better term would be “model data”) can also be extremely useful in planning and writing up research. Once again, let me assure you that I’m – of course – not advocating data fakery for publication! Instead, fake data can help you think through how you’re going to present and interpret results of an experiment or an analysis (or perhaps, even if you can interpret them), before you actually spend effort getting data in hand.

I’d put this in the context of “early writing”, which is a strategy that interweaves the writing of science with the doing of science – as opposed to doing the science first and writing it up when it’s “done”, which always seemed to me so obvious I never thought to question it. Early writing makes writing easier, and can help you spot problems with your work’s design before it’s too late. Continue reading

Good uses for fake data (part 1)

Graphic: A fake regression. You knew those were fake data, right? I may spend my entire career without getting a real regression that tight.

If you clicked on this post out of horror, let me assure you, first off, that it isn’t quite what you fear. I don’t – of course – endorse faking data for publication. That happens, and I agree it’s a Very Bad Thing, but it isn’t what’s on my mind today.

What I do endorse, and in fact encourage, is faking data for understanding. Fake data (maybe “toy data” would be a better term) can help us understand real data, and in my experience this is a tool that’s underused. Continue reading