Image: This is what 1300 g (2.8 lb) of basil looks like.
Yesterday (as I write) I bought some basil at my local farmer’s market. Quite a lot of basil, actually – almost 3 pounds of it – because it was my annual pesto-making day*. My favourite vendor sells basil by the stem (at 50¢ each), and I started pulling stems from a large tub. Some stems were quite small, and some were huge, with at least a five-fold difference in size between smallest and largest (and no, I didn’t get to just pick out the huge ones). So how many stems did I need? Or to put it the other way around, given that I bought 49 stems, how many batches of pesto would I be making, and how many cups of walnuts would I need?
My undergrad students – like a lot of biology students – don’t like statistics. One big reason is that they don’t see how it matters in everyday life. It’s presented to them as math, not as thinking help (this is one of the ways, but not the only one, that statistics is often taught badly). But my pesto problem was, in fact, just a very basic statistics problem. Or equivalently, when I made my pesto, I was doing statistics.
A single stem of basil might weigh** anywhere from a few grams well over 50 g (leftmost bar; boxes are 25th/75th percentiles, whiskers are 5th/95th)***. So I do indeed have a problem: I can’t pluck one stem and know how it fits with my recipe. But larger bundles of stems behave better. I can be fairly sure of the average weight per stem in a 6-stem bundle (which happens to be about what I need for one batch of pesto), and I can be almost certain of the average weight in a 48-stem bundle (which is approximately what I bought)****. My pesto problem is solved by my pesto enthusiasm: if I make enough, I can be quite sure how many walnuts I need to buy.
The same phenomenon – of increasing certainty with increasing sample size – is involved over and over again in life. Take baseball, for instance: we don’t put much weight on spring-training batting averages, and even a mediocre team can go on a run in a short playoff series. You’ll have your own favourite example.
Now, you may be thinking that all this is trivial and obvious. If so, you’re right; but that’s exactly my point. Most of my undergraduate students would be intuitively comfortable with the pesto solution, but would be uncomfortable with the simple statistical tools I used to work through it. How do we get our students to the point where they realize that statistics just formalizes the kind of intuitive thinking we already do about quantitative patterns in our world? How do we get them to see that mathematicians worked out statistical procedures to help us think, not to bedevil us? Better statistics courses are an obvious way to help, but probably aren’t enough. Maybe we should have all our undergraduates make pesto.
© Stephen Heard September 12, 2017
*^Pesto freezes exceptionally well. If I plan things right, we enjoy the last of one year’s batch just as the basil is arriving at the next year’s market. Here’s the recipe I use. Combine and blend in a food processor: 2 c firmly packed basil leaves (stripped from ~160 g of basil stems), 1 c walnuts, 4 large cloves garlic (sliced in a few pieces), ¾ c olive oil, and ¾ c parmesan cheese. I prefer walnuts over pine nuts (more flavour, and cheaper too), and – heresy, I know – I can’t tell the difference between pesto made with “proper” Parmesan vs. what comes from the shaker.
**^Yes, I know that grams are a unit of mass, not weight. I’m usually the pedantic one, but on this point – really?
***^A confession: I didn’t actually weigh each stem. I would have (I am, in fact, that data-nerdy), but I didn’t think of writing this post until I was nearly done. Instead, I knew that my 49 stems made 8 batches of pesto, with 7, 5, 5, 6, 6, 7, 8, and 5 stems/batch, and that all my basil together weighed 1300 g. I reconstructed the rest from that. Doing so probably underestimated the pesto problem a bit, because it imposed a normal distribution on the single-stem data. In fact, because among plants large stems tend to grow larger and suppress smaller ones, I’d expect a skewed distribution with a few very large stems, which doesn’t happen in the simulated data I graphed.
****^As a bonus, and a bit more technically, the central limit theorem guarantees that I can be very sure of how uncertainty is distributed around my 48-stem estimate – even if the distribution of single-stem weights behaves very oddly, for instance with the skew predicted in the previous footnote.