“Publish or perish”, we say, except that it probably isn’t enough just to be published: we want to be, and maybe need to be, highly cited. Tenure committees, granting agencies, and the like devour citation data, journals compete for citations to boost their impact factors, and we worry about detecting authors who self-cite to manipulate their citation stats. Now, all this may sound like a lead-in to a post decrying overemphasis on citation counting, but it isn’t. Actually, I think citation counting is worthwhile – so long as it isn’t fetishized*. After all, a paper with lots of citations probably made some people think, and with luck had some influence on the progress of science (a nice post on this from Pat Thomson is here).
Our emphasis on citation means that we are (I think) all very aware of the citation performance of our own papers. It’s easy to track via Web of Science or Google Scholar, and that’s how I made the figure above: citations vs. years post-publication for 65 of my own papers, taken from my Google Scholar profile. There’s a lot I could do with these data, but for some reason I’ve been thinking about which of my papers is the most overcited. (I hope it’s clear from the title that I want you to mention your own most overcited paper in the Comments.)
What could I mean by an “overcited” paper? Maybe two things. I’ll define a paper as “statistically overcited” if its citation rate is unusually high relative to other papers I’ve published. On the other hand, a paper is “expectationally overcited” if its citation rate is higher than I would have expected based on its content.
Statistical overcitation seems, at first blush, easy to measure. I regressed citation count against years post-publication, and looked for large positive residuals. Two papers stood out: those marked A and B on the figure, with 10 and 6 times (respectively) the predicted citation rates for papers of their ages. But this may only show that my measure of statistical overcitation is too simple. Paper A is Vamosi, Heard, Vamosi, and Webb (2009), Emerging patterns in the analysis of phylogenetic community structure. It’s a review, and reviews tend to be cited heavily; and ours came out as interest in that research area was exploding. Paper B is Mooers and Heard (1997), Inferring evolutionary history from phylogenetic tree shape, and it owes its citation success to the same factors as Paper A. Perhaps I should be searching for statistical overcitation with an ANCOVA model in which I include paper type (review vs. research report), journal impact factor, subdiscipline, and so on as blocking factors. I don’t have the sample size to do that**, although if I did, I suspect that at least Paper A would still stand out.
Of course, you can argue that reviews as a class tend to be overcited because they often become standard must-cites for papers in their research areas – that is, they’re cited as much or more for cultural reasons than for their content. While I think this is true to some extent for both papers, it’s a subject for another day.
What about expectational overcitation? That’s much more difficult to operationalize, but I think it’s also much more interesting. Here I nominate Paper C: Heard (1998), Capture rates of invertebrate prey by the pitcher plant, Sarracenia purpurea. This paper actually has a negative residual on my plot, but nevertheless it’s been cited far more (39 times) than I ever expected. It reports prey capture by leaves of the carnivorous purple pitcher plant, in a Newfoundland bog, broken down taxonomically and as a function of leaf size and age over the two full growing seasons that most leaves survive. It was a ton of work, but “only” a piece of descriptive natural history, and I published it in a low-impact journal (IF = 0.62; but still a perfectly good journal!). In my mind, I had gathered the dataset only as background for my ”real” work with pitcher-plant insects, and I wrote it up largely so as not to waste a dataset under pre-tenure pressure to publish. In other words, my citation expectations were very low: a dozen citations would have surprised me, and 39 would have been inconceivable.
Of course, identifying the paper as expectationally overcited raises the possibility that my expectations were simply too low. To some extent, mine were, for an important reason. In my attitude to my own paper, I was definitely undervaluing natural history (some thoughts on the importance of natural history from Chris Buddle here, and a nice Bioscience paper here). Plus, the pitcher-plant system has enduring appeal to ecologists and evolutionary biologists, having been used as a model system for everything from population genetics to toxicology to community ecology. Natural history matters, and despite my occasional protestation otherwise, I am a natural historian (among other things).
Even with that attitude correction, though, I still consider my pitcher-plant prey paper my most expectationally overcited. It’s very narrowly focused, and while competent enough, is far from exciting – even to me. I’m surprised (but pleased) that it’s found an audience.
So that’s mine. What’s your most overcited paper, and why do you think so? Please tell us about it in the Comments.
© Stephen Heard (email@example.com) Feb 16 2015
*And so long as we realize that citation practices vary across subfields, that occasionally a paper is highly cited because it’s bad, and that some kinds of publications are rarely cited but are still extremely important (I’m thinking in particular of taxonomic keys).
**Hey you – yes, you, the one muttering “maybe if he spent less time writing self-referential blog posts he’d have a bigger sample size” – I heard that!
This post was inspired, rather indirectly, by a Twitter conversation with Terry McGlynn (@hormiga) of the excellent Small Pond Science, about blog posts that were surprisingly influential.