Photo: Lazy red panda CC 0 via pxhere.com
I’ve just published a paper that had some trouble getting through peer review. Nothing terribly unusual about that, of course, and the paper is better for its birthing pains. But one reviewer comment (made independently, actually, by several different reviewers) really bugged me. It revealed some fuzzy thinking that’s all too common amongst ecologists, having to do with the value of quick-and-dirty methods. Quick-and-dirty methods deserve more respect. I’ll explain using my particular paper as an example, first, and then provide a general analysis.
For our paper*, we needed to estimate plant fitness (or more precisely, how fitness was reduced in plants under insect attack). But we were working with goldenrods, which are perennials that reproduce both asexually (through underground rhizomes) and sexually (through outcrossed seed). So what’s the best measure of fitness? That’s a question on which reviewers had (and shared) some strong opinions.
Here’s how we estimated plant fitness: we did something very quick-and-dirty. We simply clipped plants at ground level and measured the total dry mass of their aboveground tissue**. Since only aboveground tissue produces the photosynthates (sugars) needed for construction of all other tissues, aboveground biomass should correlate (perhaps imprecisely) with pretty much any other fitness measure one might choose. (Or so we argued; and there are data for goldenrods to back us up.) But some reviewers were horrified, and although none of them actually used the word “lazy”, it was pretty obviously there between the lines.
Instead, the reviewers argued, we should have dug up and counted rhizomes, and harvested and counted seeds. There’s no argument that those would be useful measures of plant fitness (via asexual and sexual reproduction, respectively) – even, I’ll grant you, much more precise measures than aboveground biomass.
The thing is, counting rhizomes and seeds is hard. I know; I’ve done both. Digging up rhizomes isn’t awful in a greenhouse experiment, but a mature old-field turf (where we worked) it’s extraordinarily laborious at best. Counting seeds is horribly time-consuming, because goldenrods make thousands of tiny wind-dispersed seeds distributed among hundreds of bract-enclosed flower heads. We could have counted rhizomes and seeds anyway, of course – but here’s why we didn’t. We had a limited budget for our work (both in time and money). Everyone does. In the time it would have taken to count rhizomes and seeds for one plant, we could measure aboveground biomass for 100 (at least). That’s the thing about quick-and-dirty: it’s quick. And that means there’s an important question: are we better off with one precise measurement, or 100 imprecise ones?
Here’s where the fuzzy thinking comes in: our reviewers didn’t talk about the choice between one precise measurement and 100 imprecise ones. Instead, they just complained that our measure was imprecise, with the implicit but false premise that we could have substituted a precise measure one-for-one. But it’s important to frame the 1-vs-100 part of the question explicitly, and having done that, we can answer it. So now, a more general analysis.
Imagine that you want to estimate the value of T, some biological quantity with an unknown but presumably fixed and finite value. (You can think of T as standing for “True”.) We can’t measure T directly. However, we can take multiple measurements of either H, which is hard to measure but contains quite a bit of information about T, or E, which is easy to measure but is much vaguer about T. What should we do? Reviewer-pleasing H, or quick-and-dirty E?
I’ve set up simulations to explore this question. I’ll show just one numerical example, but here’s the R code, so you can play around to your heart’s content. Here we go:
- The true value we’re after is T = 100.
- Hard-to-measure H is a random normal variate with mean 100 and variance 4 (that is, each measurement of H is a relatively precise estimate of T). Easy-to-measure E is another random normal variate, this time with mean 100 and variance 64 (each measurement of E is a relatively imprecise estimate of T).
- Each replicate of an experiment to estimate H costs $100, while each replicate of an experiment to estimate H costs $5. (Note that nothing changes if our currency is time instead.)
- We have a total budget of $1000, so we can measure H 10 times or E 200 times. In either case, we take the mean across measurements, which is our best estimate of T.
- I simulate each strategy 10,000 times (that is, 10,000 estimates of T each based on 10 measurements of H; and 10,000 estimates of T each based on 200 measurements of E).
Here’s how that comes out:
The top plots show the distributions of values of H and E. Both are centred on the true value T, of course; but H values are bunched tightly around T, while quick-and-dirty E estimates are broadly spread. Nothing surprising there.
The bottom plots show the distribution of T estimates if we spend our budget on a few very precise measurements (via H, mean of 10 measurements) or a lot of quick-and-dirty ones (via E, mean of 200 measurements). In my example, the quick-and-dirty method is better. (See, reviewers? See? Sigh.) In this case, the quick-and-dirty measurements had 16 times the variance of the precise ones – but they were 20 times cheaper, and the replication that allowed gave them the edge in estimation.
Did I set up the numbers to make the comparison come out the way I wanted it to? Well, sure; but it wasn’t that hard. It all depends on the ratio between the cost difference and the variance difference between the two measures H and E***. What may not be intuitive is that for perfectly realistic (in ecology) cost ratios, the properties of statistical estimation are such that quick-and-dirty measures will outperform precise ones. In my goldenrods, it’s at least 100 times more costly to count rhizomes and seed than to measure aboveground biomass (with 100 being a conservative estimate). As a result, I could crank up E’s variance to 400 (in my example) and still come out at least even****. In other words, if quick is quick enough, dirty can be really dirty.
So, if you’re ever tempted to cast aspersions on a quick-and-dirty technique, please do what my reviewers didn’t: think twice. Sometimes, an apparently lazy scientist is actually just happy to let mathematics do the work for free (via E and the magic of statistics). Sometimes, a lazy scientist is just an efficient one.
© Stephen Heard January 29, 2018
*^Our paper, which was the MSc thesis of my excellent student Yana Shibel, has to do with the evolution of herbivore impact in novel plant-insect interactions. We tested the hypothesis that after a new plant-herbivore interaction is established, selection will favour reduced impact of herbivore on host plant. If you’re curious about that, you can find the paper here; and if the paywall is a problem for you, just email me and I’ll send a copy.
**^This is not an uncommon technique, and often works fairly well, according to a providentially-recently-published review.
***^You can solve all this analytically, which I leave (as we love to say) as an exercise for the reader. It comes out quite simply. Given normality assumptions, if you double the variance, you need to double the number of estimates to preserve estimation performance.
****^Or actually, way ahead, given the sanity-destroying nature of excavating rhizomes and counting seeds.
The code looks a little convoluted. Don’t forget that R is vectorized. I think you could just use
Hdrawmeans <- replicate(10000, mean(sample(H, NH, replace = TRUE)))
instead of creating that gigantic matrix.
Also, you don't need paste() when you are specifying main = …
LikeLike
Thanks – although I suspect you may be missing the point a little! This is to some extent why I hesitate to post R code…
LikeLiked by 1 person
Makes sense!
A similar, but sort of inverse, logic may apply to e.g landscape ecology studies when one can sample more sites (expensive) or increase sampling effort, and thus precision, by sampling more intensively in each site (cheaper) – increased precision means lowev variance and therefore fewer replicates needed. I once ran some simulations of it: https://anotherecoblog.wordpress.com/2017/01/27/quando-menos-e-mais-e-quando-menos-e-menos-numero-de-amostras-vs-esforco-amostral/
LikeLiked by 1 person
Which is why I try to convince students on my stats course (and will do more so this coming year): DO A SIMULATION BEFORE GOING TO THE FIELD! It’s free, relatively easy to do (no need for analytical math and can done using very broad estimates of the parameters), and, other than writing the data generating code, the rest of the work is something that you will have to do anyway (analyze the data). And there is so much to be gained. Thinking about the data-generation process makes you think about the actual underlying mechanisms, doing the simulation will let you plan and optimize sampling effort/cost, AND often times when trying to analyze the simulated data, you realize you need to change some aspect of your analytic approach. I’ll be able to add a fourth reason now: it may help you argue with your reviewers 🙂
LikeLiked by 1 person
Good points! You’ve likely thought about this, but I wanted to point out that if “quick and dirty” is also biased, not just imprecise, then that could be a whole ‘nother problem, though how big a problem that is depends on what you want to do with the data. I have no opinion, though, on whether that concern applies to measuring goldenrod fitness, though I suspect some of your reviewers might (informed or otherwise) 🙂
LikeLiked by 1 person
Yes, quite right! Bias as opposed to imprecision is another issue. In a goldenrod context, interestingly, it’s probably the more precise estimates that are biased – seed counts are extremely biased-high, as they don’t consider the fate of seeds. There’s lots of seed predation, seeds that don’t establish, etc. But your more general point is spot on, of course!
LikeLike
I guess my other concern would be the situation where H has the correct mean, but E is a biased estimator of the true mean. Then your measure of E would converge on the wrong number, while H gets it right.
On the other hand, I think the opposite situation is particularly interesting where E is actually a better estimator if you get lots of replication and H is biased. I wouldn’t be surprised if this was a common occurrence. It would be great in an ecological study were we could confidently collect a huge number of samples an avoid all the problems with small data sets.
I wonder whether E is the better measure in the goldenrod scenario, especially because you have to equate rhizomes dry mass and seed number/mass into a measure of total fitness. I wouldn’t be surprised if that measure was as biased as the aboveground biomass measure. It could also be impacted by amount/type of herbivory present by changing the plant’s allocation strategy.
I short, I agree that quick and dirty can sometimes be better, especially if it isn’t going to be more biased than the time consuming method.
LikeLiked by 2 people
I often tell my students that if you can’t be accurate, be consistent. Lots of imprecise measurements can still add up to a strong estimate if they’re properly replicated. You wouldn’t have struggled to get this point though to me if I were the reviewer 🙂
LikeLiked by 1 person
I will be suggesting you as a reviewer every single time from now on 🙂
LikeLike
Markus, you are confusing accuracy; how close a measurement result is to the true value, with precision; how close the measurement results are to one another. Consistent measurements should yield precise results, but they might all be consistently inaccurate.
LikeLiked by 2 people
Yes, you’re quite right, accuracy and precision have different meanings in sampling. Put this down to imprecise wording on my part.
LikeLiked by 1 person
A quick comment for including seed for fitness score: collecting random primary seed samples and a seed analysis of the composite sample for seed weight and germination percentage. This required two weeks for three replicates by two part-time work-study undergraduate students (three-acre study plot of sexual-reproducing species). This was in addition to aerial biomass weight. It was an additional relatively economical metric.
Rhizomes…….. a different ‘animal’ all together. Been there. The ‘quick and dirty’ method definitely less time/cost consuming. It would be interesting, however, to conduct a comparison with a sub-sample that included seed and rhizome metrics.
LikeLike
All great points. A few quick thoughts:
a) This is not exact (because here sigma is the population variance not measurement error, so your simulation is better) but I always go to SE=sigma/sqrt(n) so as your simulation nicely shows, it all depends on the tradeoff between sigma and n, but the sqrt in the denominator means you have to lose a lot of precision to not want to pick the high n choice. In general, even though ecologists can spout off about GLMM and Bayesian, I don’t think that many really understand sampling theory. In fact I would nominate SE=sigma/sqrt(n) as one of the most underappreciated equations in ecology.
b) As several noted it all comes down to a question of bias then. But like you I can find no reason to think biomass is more biased than seeds. If anything I’d guess the other way.
c) Like Macrobe just pointed out # of seeds (or rhizomes) is a pretty distant proxy for fitness too without getting into seed (or rhizome) quality. In a density dependent world fewer higher quality seeds could easily have higher fitness (I.e. seed # and fitness could be negatively correlated).
In general I run into this all the time. Most of the world (e.g. business) recognizes tradeoffs between resource investment and outcome. Scientists like to pretend that we can afford to be “spare no expense pursuit of the “best” outcome”. It happens in stats too. But aside from the fact “best” is multidimensional, a matter of opinion and probably illusory, we very much live in a resource (time and money) limited world. We definitely need to right-size our efforts. But oh its so fun to be righteous and point out to somebody else that they could have done better. Its not good for science.
LikeLike
Thanks, Brian! Your point (a) is a nice pithy summary. It’s the square root that does it! I think people miss that, intuitively. Even those who “get” tradeoffs may miss, intuitively, just how powerful replication can be.
LikeLiked by 1 person
To clarify, my prior comment addressed seed viability measured by seed weight and germination percentage, not seed number (which we purposely did not include).
Regardless, depending on the study question(s) (and as Brian pointed out and our study revealed) seeds may be relatively unimportant in the short-term survival of some populations. Lower seed number, weight and/or germination may be an adaptive trait in a specific population of a specific species.
In other words, it depends. 🙂
LikeLike
Interesting post, and I’d agree with all of this. One think to note though. In my experience, virtually all measures of plant size/growth/reproduction/fitness are positively correlated. Plant size correlates with number of flowers which correlates with number of seeds. Above ground and below ground biomass are correlated, etc. etc. So far so good, bigger is fitter.
However plant size also correlates with some things that could have more subtle influences on fitness, e.g. larger plants start to flower earlier in the season and could be at an advantage or disadvantage depending on weather, pollinators, flower herbivores, availability of mates, etc. So if you’ve looking for subtle differences in fitness, quick-and-dirty might not be the best approach.
LikeLiked by 1 person
Pingback: How to find a squirrel | Scientist Sees Squirrel