Statistics in Excel, and when a Results section is “too short”

Every now and again, you see a critique of a manuscript that brings you up short and makes you go “Huh”.

A student of mine defended her thesis a while ago, and one of her examiners commented on one of her chapters that “the Results section is too short”*Huh, I said.  Huh.

I’m quite used to seeing manuscripts that are too long.   Occasionally, I see a manuscript that’s too short.  But this complaint was more specific: that the Results section in particular was too short. I’d never heard that one, and I just couldn’t make sense of it.  Or at least, not until I realized that it fits in with another phenomenon that I see and hear a lot: the suggestion that nobody should ever, ever do their statistics in Excel.

What’s the connection?  It’s this: I propose that when we start out to design an experiment (or an observational dataset), our goal should always be to run our statistics in Excel, and to produce a suspiciously short Results section. This proposition may seem odd, so let me explain.  I’ll start with Excel.

Excel does a perfectly fine job of running simple statistics – t-tests, one-way or two-way fixed-factor ANOVAs, regressions, things like that.  Its statistical capabilities are limited, though.  If you need to run a Bayesian GLMM on a dataset with zero inflation and severely unbalanced sample sizes (just an example), you’re going to need to do it in R, or another software package designed to run sophisticated statistics.  But there are two ways you can wind up needing to run a Bayesian GLMM with zero inflation and severely unbalanced sample sizes.  One is that the ecological situation you’re dealing with is simply too complex for any other model to work.  If so, fair enough.  But the other way is that you spent insufficient time thinking through experimental design before you ran the experiment or made your observations.  I know this is true, because I’ve been guilty of it myself.  At least twice, I’ve set up an experiment and then realized only later that I’d accidentally run a split-plot.  (#Headdesk.)

When you can (and again, you can’t always), it’s worth investing some time up front to come up with an experimental design that answers your question with the minimum of statistical complexity**.  Similarly, it’s worth putting some thought into what you’re going to measure – and later, of the things you did measure, which things actually help tell the story your paper needs to tell.  Reporting more variables, and more complicated ones, in more detail will make your Results section longer – but not necessarily better.  Remember, everything that goes in to your manuscript represents a request for reader attention and mental energy.  Those are limited resources***.

I wonder if my thesis-reading colleague’s reaction betrayed the same kind of thinking that leads scientists to keep writing convoluted, jargon-ridden sentences and using the passive voice.  I wonder if it’s a deep-seated suspicion that if it’s simple and straightforward and easy to understand, it can’t be real science.  This is nonsense, of course.  Some science is complicated and hard to understand; some is not.  And our goal, in experimental design, analysis, and writing, should be this:  to make our new insights into the world as easy as possible for others to understand. There’s a time when only I know the new thing I’ve learned about the world.  It’s a thrill, and one of the best things about being a scientist; but it is and should be an ephemeral thrill, because our job doesn’t end with discovery.

So: a short Results section, and statistics in Excel.  These aren’t signs of scientific weakness; they’re signs of exceptionally good experimental design.  Let’s aspire to them.

© Stephen Heard  November 6, 2017


*^The comment went on to suggest (although not quite in these words) that if the Results section was so short, then the work probably wasn’t interesting enough to be a paper; or alternatively, that my student hadn’t really understood her own work.  Why some people write what they write on student theses, I’ll never understand.

**^Related, but not quite the same thing, is the tendency, given the data, to use more complex statistical methods than necessary – something Brian McGill calls statistical machismo.

***^Someone who reads this post carelessly will object, probably on Twitter, saying that science is intrinsically hard, and that if a reader isn’t willing to work at understanding a paper they don’t belong in science and that writers shouldn’t have to cater to their laziness.  They are right in part: science is hard. People like them make it harder.  Be one of the ones who makes it easier instead.

Advertisements

31 thoughts on “Statistics in Excel, and when a Results section is “too short”

  1. amlees

    I’d like to say, carelessly, that science is intrinscially simple, although I am already prepared to admit that some of the concepts can be quite difficult to master, and that some scientific models are not exactly straightforward.

    Like

    Reply
  2. John Pastor

    I agree completely. When I was at Oak Ridge in the early 1980s, Bob Gardner, no slouch to statistical sophistication, told me that if I could analyze my data using a t-test, then do it because we understand how a t-test works. Keep it simple. There are reviewers who want the latest statistical test or ordination with a lemon twist, but those often don’t tell you much more than a t-test or a Bray-Curtis ordination if your experiment was well-designed from the beginning.

    On another note, I disagree slightly about the use of passive voice. The passive voice is a tool in the grammatical toolbox like the active voice. Usually, the sentence is stronger if it is written in the active voice. But there are times when the passive voice is logically more clear in the context of the paragraph that surrounds it. I always say to my students, don’t fall into the passive voice because you are hesitating about what you want to say, but actively choose it if it improves the logical flow. If you are at ESA next summer, let’s talk about it over coffee.

    Liked by 1 person

    Reply
    1. ScientistSeesSquirrel Post author

      We probably don’t disagree about the passive that much. You are quite right: there are some constructions in which it works better; and using a bit can help vary the texture of your writing. But few scientific writers need any encouragement to use the passive; that will take care of itself. What we need is encouragement to use the active!

      Liked by 2 people

      Reply
  3. Russ

    I don’t use excel for stats because it doesn’t report the full range of statistical output i’m, interested in. also, in the past, there were a series of papers showing excel made errors with some stats tests. regarding both of these points, it is good to remember that excel was not written to do stats, that’s an add-on. I don’t think they take it all that seriously.

    Like

    Reply
  4. davidnfisher

    Completely agree that a simple analysis is better than a complicated one. But is Excel really a good tool?
    If you did the simple analysis in R, you can then make the R code public to allow others to replicate what you did. The code for a t-test is unlikely to be useful to anyone, but perhaps the full pipeline of subsetting, standardising and checking for outliers can help tackle the “researcher degrees of freedom” issue?

    Also R is free and Excel is not, but that may not matter for those at institutions.

    Like

    Reply
  5. Brian McGill

    As you know, I completely agree about the stats.

    But I also completely agree about the short results section. I always tell my students the ideal results section is a single paragraph. Really if you’ve lined up your question clearly (good introduction) and done good experimental design it should only take one or two statistical analyses to test it, and how much space could it take? You of course need to interpret it but that belongs in the discussion.

    And I agree that lurking under the surface here is careful thought about question and experimental design before starting the work. A lot of times statistical machismo is about saving a bad dataset.

    Liked by 2 people

    Reply
  6. Bobo

    Why use Excel when you can do it in R (and test the assumptions) in just four lines of code? Excel is not simpler (and certainly not better), just because it’s a GUI. Are you going to write your own functions to test the assumptions, or just omit the assumption testing altogether?

    d <- read.csv("data.csv")
    qqnorm(d[d$category == "A", "continuous"])
    qqnorm(d[d$category == "B", "continuous"])
    t.test(continuous ~ category, data = d)

    That's far less cumbersome and less prone to error than having to drag to select cells, etc. I don't see any scenario in which Excel is simpler or better than this.

    Plus, Excel is worse because it is proprietary software, so the exact methods are not repeatable by people without access to this software. (And yes, these people do exist in poorer nations. But it's part of a more general principle that analyses should be repeatable by anyone anywhere without any barriers. And this means open software.)

    Liked by 1 person

    Reply
    1. Peter Apps

      And if you can’t do it in R, because learning R means using up valuable time that can be better spent doing other things ?

      Like

      Reply
      1. Peter Apps

        Your colleague may be right about the relative sexiness; although we never have trouble recruiting for “glamorous” work in the African bush the reality sometimes comes as a bit of a shock. I nearly always find that it is actually easier and quicker to take a few good, accurate and precise measurements rather than shovel up a pile of poor data in the hope that model will sieve something out of it.

        Liked by 2 people

        Reply
  7. Elina Mäntylä

    I recently reviewed a manuscript and there the Results section was really short: maybe 4 sentences + one table. But that didn’t bother me. The experiment was “small” and straightforward, and the authors were interested in only one thing. Some of the other reviewers would have liked to have more/complex analyses. I didn’t see a point to ask for more when the result was clear with one simple analysis.

    Liked by 1 person

    Reply
    1. ScientistSeesSquirrel Post author

      There’s no doubt it has lots of issues for complex stats. I’ve never had a problem with correlations or regressions or t-tests – but of course when the data get gnarly I switch. (Actually I usually do only a few quick-and-dirties in Excel, when I’m exploring data). Although of course this really isn’t the point, which is really about avoiding “data get gnarly” in the first place…. I could have written about “Statistics done by hand” and made the same point!

      Like

      Reply
    1. ScientistSeesSquirrel Post author

      Actually, what I took from that PDF was mostly that in order to force an error that matters, you have to give Excel really silly data… And now I’m the one drifting off topic into irrelevant details about Excel in particular 🙂

      Liked by 1 person

      Reply
    2. Paola Lombardo

      Same here. I have run a few random checks with simple stats both in Excel and using XLSTAT: they tend to agree for the first 2-4 decimal points of std dev/err, R2, t tests, ANOVAs, and related p values. I often run initial exploratory stats with Excel, then switch to the more “professional” XLSTAT if analysis needs to get more complex or if the outcome is too close to the p=0.05 (or other) threshold.

      Very nice blog and debate session. Thanks!

      Liked by 1 person

      Reply
  8. Pingback: Defending (to a degree) statistical ‘machismo’ – Community Ecology and Phylogenetics

  9. Andrew MacDonald (@polesasunder)

    Emerging from twitter to comment here 😉
    It seems to me that the major emphasis has to be on communication. However you work, you have to be able to clearly communicate it to others — whatever the apparent “complexity” or “simplicity” of your analyses. It is quite possible to create an unreadable R script, just as it is to create an impenetrable Excel workbook. Indeed in both it is quite possible to make serious mistakes. The tools have very little to do with how well you work, or how well you communicate.

    If, in some brighter future, scientists routinely perform analysis in a way that reviewers and readers can and do actually read — perhaps then, we could abandon arbitrary standards of “complexity” for simple communication.

    Liked by 1 person

    Reply
  10. Andrew MacDonald (@polesasunder)

    Separate point, but I have never really understood why statistical machismo is bad. What is *macho* and somehow Not Good about trying to “Save a bad dataset”? Just because data is bad doesn’t mean it isn’t the best data we’ve got!
    Of course I am also deeply emotionally invested in Bayesian GLMMs with zero inflation, so it is possible I’m just being defensive.

    Liked by 1 person

    Reply
    1. Gary Grossman

      Well I think the distinction is using 96 octane statistics when 80 octane statistics give you an answer with as much “power” and take much less time to describe. It’s the old KISS rule, although I don’t know if millennials know that acronym. Not that you’re millennial . But I applaud your self-awareness regarding statistical fetishes JK. g2

      Like

      Reply
    2. ScientistSeesSquirrel Post author

      Andrew – agree with you 100% about trying to save a bad dataset. Drives me crazy when reviewers tell me the data aren’t perfect. I’m usually aware!! So yes, by all means, when the data require it, use complex stats. My point is only that sometimes, you can avoid the data requiring it; and when you can, you should.

      Like

      Reply
  11. Ryan

    I’m worried that the ideal of complexity in science is still permeating. I think it has to do with cultural expectations of what science is and ought to be. Many 4th science undergrad students in my seminar were complaining about a paper for being too short and simple. It was short and simple, but it was scientifically sound (not too simple) and prescriptively useful for conservation management. Whereas some other past seminars which discussed some complicated, poorly communicated technical papers with stats that I struggled, with went without major critique. Certainly some articles are bound to be complex, but we must focus on making our work as simple as possible, (and no simpler).

    Like

    Reply
  12. Pingback: Recommended reads #116 | Small Pond Science

  13. Pingback: Starting experiments with a “nut fig” | Small Pond Science

  14. Pingback: The J-shaped curve of blog-post popularity | Scientist Sees Squirrel

  15. Pingback: Making people angry | Scientist Sees Squirrel

Comment on this post:

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.