(Image: Robert Boyle’s (1660) vacuum pump, from New Experiments Physico-Mechanical, Touching The Spring of the Air, and its Effects; Made, for the most part, in a New Pneumatical Engine)
Unless you’ve been living under quite a large rock, you’ve heard or read a lot lately about the “reproducibility crisis” in science (here’s a good summary). That our work should be reproducible is certainly a Good Thing in principle, but there are complications where the rubber hits the road. Today, some thoughts on reproducibility, and on what, if anything, it means for the writing of a paper’s Methods section. And I think some historical perspective is both interesting and useful – because the reproducibility “crisis” is 400 years old.
There’s an odd disconnect in the way we think about our Methods sections. Most books on scientific writing (for instance, Katz or Day and Gastel) say the Methods should give enough detail for readers to repeat your work and reproduce your results. And when I poll seminar audiences, 75-80% agree that reproducibility is the primary function of the Methods section. However, studies of the way scientists actually write (e.g., Swales 1990, Gross et al. 2002) find that few published papers come close to this level of detail. We tell each other one thing, but we do something quite different. And it turns out this question of what the Methods section is for, and therefore what ought to go in it, has been causing angst among scientists for as long as we’ve been writing science. It’s part of a larger and still unsettled question about how scientific knowledge gains authority.
Scientists working at the birth of modern scientific communication, in the 17th century, belonged to the intellectual tradition of the European Renaissance. Renaissance thinkers rejected the older Medieval (or Scholastic) emphasis on learning from earlier texts, believing instead that learning should come from empirical observation. The problem was that as science progressed, this put scientists in an increasingly awkward position: it became more and more obvious that further progress could only come from one scientist building on results reported by others. So how could those reports earn authority?
The famous physicist Robert Boyle grappled with this question in the middle of the 1600s, and his answer had three elements (Shapin 1984). First, Boyle gave exhaustive detail of equipment, material, and procedures, so that readers could (at least in principle) reproduce his experiments. Second, he argued for “communal witnessing”: if results were to have authority, experiments should be witnessed – so Boyle conducted many of his key experiments in public, and published the names and qualifications of witnessing scientists along with his results. Third, Boyle described in exhaustive detail not just his methods, but his experiments’ circumstances and settings, his false starts and failures, and much else. For example, to accompany his reports of experiments using his famous vacuum pump, he provided an illustration (above) of the pump. Not, importantly, of a vacuum pump, but of the vacuum pump he used, complete with irregularities, dents, and dings. The point of all this description was to make readers feel as if they had been there – to recruit readers as “virtual witnesses” – and this is why 17th and 18th-century scientific texts often have a charmingly narrative feel. My favourite example is Pierre-Louis de Maupertuis’ (1737) account of an Arctic expedition to measure Earth’s shape. He spends many pages relating the excitement and hardships of his travels: among other things, the midnight sun in Finland, the assaults of biting flies, techniques for defence against kicking reindeer, and cold that left only his brandy unfrozen to drink.
The thing is, none of Boyle’s three answers to the authority problem really worked. Boyle himself conceded that his experiments were rarely repeated. And of course, if every study was repeated just once, the gross rate of scientific accomplishment would presumably be halved. Communal witnessing was cumbersome even when science was a hobby of a few gentlemen of leisure, and became hopelessly inefficient as our enterprise grew. Finally, virtual witnessing was a rhetorical device, not a logical one.
Around the middle of the 19th century, the professionalization of science led to a new kind of authority. Work began to be considered reliable not because it was replicated, witnessed, or detailed, but instead because it was done by someone belonging to a community of established and credentialed scientists. The historian Steven Turner suggests that science had come to “a deeply-rooted ideology of honesty and accuracy that helped ensure…trust” (pers. comm.). In the 20th century, this professionalism became supplemented by peer review, and the function of the Methods began to include convincing experts that an author was using appropriate methods that made the results plausible. Both reviewers and readers made, and make, these plausibility judgements, nearly always without actually attempting to replicate work.
Where do we stand today? In modern science both replicability and witnessing both survive, but I think their role lies largely in testing extraordinary claims like cold fusion or the supposed hyperdilution memory of water. That professionalism is the major grounds for authority explains why scientific fraud is always shocking and why it’s often slow to be discovered (Diederick Stapel, for instance, falsified data for at least 55 psychology papers before being caught). It also explains why no matter how much we tell each other our science should be reproducible, we rarely reproduce it.
If scientific results aren’t routinely verified by repetition, how are they verified? Many never are (you can think of this as our collective shrug about their importance). But when verification comes, it comes because a study’s results prove consistent with those of other different studies, and because other scientists are able to build further understanding on top of them. That fraud remains relatively rare, and seldom distorts our understanding for long, suggests that the rarity of repetition isn’t actually a major handicap to the progress of science.
All this suggests that claiming that a paper’s Methods section is about reproducibility is a misunderstanding of both the history and the process of science. A Methods section is really about establishing the credibility of your approach, and thus giving readers a reason to believe your findings. (In addition, the Methods tell readers what they need to know about the procedures if they are to understand the Results.) Since the vast majority of your readers will never try to reproduce your work, filling the Methods section with the detail they would need to do so is unnecessary – and worse, abuses your reader’s limited time and patience.
Does this mean we shouldn’t encourage reproducibility? Of course not; calls for greater reproducibility have led to some very good things, such as the increased archiving of raw data and posting of source code. And by all means report methods in great detail – but place that detail in an online supplement where it can be conveniently ignored. After all, most readers are better served by ignoring it.
And don’t be stressed if, in fact, your work is never precisely reproduced. It’s still science, and that’s been true for 400 years.
© Stephen Heard (firstname.lastname@example.org) Feb 27 2015
This post is based on material from The Scientist’s Guide to Writing, my guidebook for scientific writers. You can learn more about it here.