I had more than my usual dose of conferences last summer (as you might have noticed). After four major conferences in three months, something finally sunk in to my tired, tired brain: conferences tell a very different story than journals. In particular, conference talks are loaded with negative results – far more so than our journals*. So is this a problem? An opportunity? Both?
My first thought was that I was getting a glimpse of the open file drawer. Everyone is familiar with the “file drawer problem” – it’s easier, and more fun, to publish a compelling positive result than a negative one. So all those experiments yielding nonsignificant ANOVAs, all those observations with shotgun-blast non-correlations, end up in the file drawer, and our understanding of the world is slanted by the resulting filter. Aha, I thought – comparing conference talks with published papers could measure the file drawer; and publishing conference-presented results could prop the file drawer open so the light can shine in there. Even better, people who register for conferences before completing data analysis could be thought of as having preregistered their studies. Brilliant!
Well, not so fast. I thought a little more, and of course it isn’t that easy.
First, the negative-result mismatch may not be a problem. There are several reasons for the pattern I’ve noticed, and they don’t all involve the file drawer. Imagine that I present some negative results at a conference, but don’t publish them – or I later publish an analysis with a positive result instead. If this is because I gave that easy negative-results talk but then dropped the data and forgot about it, then yes, that’s the file drawer. (Or if I kept trying new analyses until finally I got P<0.05, that’s P-hacking.) But conferences are good places for pilot data and proof-of-concept talks, too. Sometimes I might present negative results because only some of my data are in; later, with a larger sample size and the power that results, a real pattern may emerge. Sometimes I might present negative results because I’ve only done a quick-and-dirty analysis; with more time, I can use more sophisticated statistics to pick apart complexity and reveal the real pattern. In either case the negative result is best thought of as incomplete or provisional, a placeholder rather than the definitive answer that must either be published or consigned to the file drawer**.
Second, using conferences to measure the file drawer may not be an opportunity (or at least, not an easy one). Partly that’s a consequence of my argument in the last paragraph: if many negative-result talks aren’t really examples of the file drawer, drawing useful comparisons of negative-result frequencies from conferences and journals will be very difficult. But it’s also because there’s likely to be an observer effect, in that attention drawn to those negative results is likely to change our willingness to present them. Doing much with conference negatives would, I think, require that they be published somehow. In some fields (such as engineering and computer science), conference abstracts are routinely published (and I would speculate that tentative and negative results are less commonly represented). In my own field, some conferences put abstracts online with the programme; other conferences list only titles. What would happen if we required abstracts with results, and made them searchable? Again, speculation; but I’d be surprised if we didn’t see a change in the kind of presentations on offer. Those with negative results might be less likely to submit them; and those with no results at all (yet) would not be able to. Either shift would compromise the measurement of the file drawer problem – the very measurement I suggest depends on mandatory deposit of abstracts in the first place***.
So: darn it. I thought I had a brilliant idea, but like a lot of the times I think that, it didn’t hold up well to scrutiny. Of course, it wouldn’t be impossible to study the frequency of negative results at conferences and in the literature; but it would be very difficult. It would require in-person attendance rather than work with online abstracts, it would need lots of careful judgment calls (to separate pilot studies and quick-and-dirty first cuts from true negatives), and there are probably more complications I haven’t thought of. Ph.D. thesis in science studies, anyone?
So at next year’s crop of conferences, how should I think about those negative-results talks? Well, some of them will be true negatives. But many will be just works in progress, and their presentation represents an opportunity for my colleagues and me to put our heads together – in search of refinements to methods, more powerful analyses, and all the rest. Nature is complicated. Our first answer isn’t always (or even usually) the final word; and we’re cleverer together than we are individually. This makes science fun, and it should make negative-results talks at conferences fun too.
© Stephen Heard (email@example.com) December 28, 2016
*^I don’t know why I’d never paid attention to this before. It’s hardly a subtle pattern, and I’m sure all kinds of people have been smarter than me and remarked on it before. But then, it does say “seldom original” right there in the Scientist Sees Squirrel masthead.
**^Actually, I think analyses for conferences are almost always provisional, for three reasons. First, we’re all excited to talk about our freshest data. Second, we all overestimate how quickly we can process and analyze our data. Third, and most importantly, it’s rare for the first version of an analysis to be the definitive one. The very process of presenting the data in a talk, or writing it up for submission, is likely to spark realizations about better ways to work with the data; or audience members or peer reviewers may suggest more powerful approaches. This is completely normal.
***^Of course, any individual presenter can upload their abstract, full poster, or talk slides to an archive such as FigShare or F1000Research (here’s the Ecological Society of America’s “channel” at F1000Research, as an example). But I strongly suspect that this voluntary “publication” provokes an even stronger shift in the frequency of negative and incomplete results than mandatory “publication” would.