What’s your most overcited paper?

“Publish or perish”, we say, except that it probably isn’t enough just to be published: we want to be, and maybe need to be, highly cited.  Tenure committees, granting agencies, and the like devour citation data, journals compete for citations to boost their impact factors, and we worry about detecting authors who self-cite to manipulate their citation stats.  Now, all this may sound like a lead-in to a post decrying overemphasis on citation counting, but it isn’t.  Actually, I think citation counting is worthwhile – so long as it isn’t fetishized*.  After all, a paper with lots of citations probably made some people think, and with luck had some influence on the progress of science (a nice post on this from Pat Thomson is here).

Our emphasis on citation means that we are (I think) all very aware of the citation performance of our own papers.  It’s easy to track via Web of Science or Google Scholar, and that’s how I made the figure above: citations vs. years post-publication for 65 of my own papers, taken from my Google Scholar profile.  There’s a lot I could do with these data, but for some reason I’ve been thinking about which of my papers is the most overcited.  (I hope it’s clear from the title that I want you to mention your own most overcited paper in the Comments.)

What could I mean by an “overcited” paper?  Maybe two things.  I’ll define a paper as “statistically overcited” if its citation rate is unusually high relative to other papers I’ve published.  On the other hand, a paper is “expectationally overcited” if its citation rate is higher than I would have expected based on its content.

Statistical overcitation seems, at first blush, easy to measure.  I regressed citation count against years post-publication, and looked for large positive residuals.  Two papers stood out: those marked A and B on the figure, with 10 and 6 times (respectively) the predicted citation rates for papers of their ages.  But this may only show that my measure of statistical overcitation is too simple.  Paper A is Vamosi, Heard, Vamosi, and Webb (2009), Emerging patterns in the analysis of phylogenetic community structure.  It’s a review, and reviews tend to be cited heavily; and ours came out as interest in that research area was exploding. Paper B is Mooers and Heard (1997), Inferring evolutionary history from phylogenetic tree shape, and it owes its citation success to the same factors as Paper A.  Perhaps I should be searching for statistical overcitation with an ANCOVA model in which I include paper type (review vs. research report), journal impact factor, subdiscipline, and so on as blocking factors.  I don’t have the sample size to do that**, although if I did, I suspect that at least Paper A would still stand out.

Of course, you can argue that reviews as a class tend to be overcited because they often become standard must-cites for papers in their research areas – that is, they’re cited as much or more for cultural reasons than for their content.  While I think this is true to some extent for both papers, it’s a subject for another day.

What about expectational overcitation?  That’s much more difficult to operationalize, but I think it’s also much more interesting.  Here I nominate Paper C: Heard (1998), Capture rates of invertebrate prey by the pitcher plant, Sarracenia purpurea.  pitcherplantsmallThis paper actually has a negative residual on my plot, but nevertheless it’s been cited far more (39 times) than I ever expected.  It reports prey capture by leaves of the carnivorous purple pitcher plant, in a Newfoundland bog, broken down taxonomically and as a function of leaf size and age over the two full growing seasons that most leaves survive.  It was a ton of work, but “only” a piece of descriptive natural history, and I published it in a low-impact journal (IF = 0.62; but still a perfectly good journal!).  In my mind, I had gathered the dataset only as background for my ”real” work with pitcher-plant insects, and I wrote it up largely so as not to waste a dataset under pre-tenure pressure to publish.  In other words, my citation expectations were very low: a dozen citations would have surprised me, and 39 would have been inconceivable.

Of course, identifying the paper as expectationally overcited raises the possibility that my expectations were simply too low.  To some extent, mine were, for an important reason.  In my attitude to my own paper, I was definitely undervaluing natural history (some thoughts on the importance of natural history from Chris Buddle here, and a nice Bioscience paper here).  Plus, the pitcher-plant system has enduring appeal to ecologists and evolutionary biologists, having been used as a model system for everything from population genetics to toxicology to community ecology.  Natural history matters, and despite my occasional protestation otherwise, I am a natural historian (among other things).

Even with that attitude correction, though, I still consider my pitcher-plant prey paper my most expectationally overcited.  It’s very narrowly focused, and while competent enough, is far from exciting – even to me.  I’m surprised (but pleased) that it’s found an audience.

So that’s mine.  What’s your most overcited paper, and why do you think so?  Please tell us about it in the Comments.

© Stephen Heard (sheard@unb.ca) Feb 16 2015

*And so long as we realize that citation practices vary across subfields, that occasionally a paper is highly cited because it’s bad, and that some kinds of publications are rarely cited but are still extremely important (I’m thinking in particular of taxonomic keys).

**Hey you – yes, you, the one muttering “maybe if he spent less time writing self-referential blog posts he’d have a bigger sample size” – I heard that!

This post was inspired, rather indirectly, by a Twitter conversation with Terry McGlynn (@hormiga) of the excellent Small Pond Science, about blog posts that were surprisingly influential.


20 thoughts on “What’s your most overcited paper?

  1. jeffollerton

    Interesting idea for a post and when I get a minute I’ll crunch my own data. However I should say that using the citation stats from Google Scholar wouldn’t be my first choice as I don’t think it’s reliable. Google S. counts any document that it finds on the web including lots of non peer-reviewed items. When calculating citation rates I prefer to use Web of Science: more conservative but also more accurate.


  2. ScientistSeesSquirrel Post author

    Jeff – you are right that Google Scholar is less conservative. In my case, at least, the analysis would be exactly the same – except that WoS completely misses several of my publications, including one (in Historical Biology) with >100 citations!

    Liked by 1 person

  3. jeffollerton

    Have just done a quick analysis of my stats from WoS but can’t see a way of pasting the figure into comments (is that even possible?) But I have two big outliers in my graph: Waser et al. (1996) Ecology cited 856 times; and Ollerton et al. (2011) Oikos cited 192 times.

    The former was a review/synthesis that resulted in a critical re-think of how we consider plant-pollinator interactions. The latter was (believe it or not) the first rigorous calculation for the global number and proportion of flowering plants that are animal pollinated, and has become the go-to reference for citing the importance of biotic pollination in terrestrial ecosystems.


  4. Dylan Schwilk

    This gave me an excuse to look at this. My positive and negative outliers were informative. One of the most negative outliers was interesting to me, although not surprising. It was a small paper proposing a solution to a local mystery in plant distribution and was published in the California Botanical Society journal, Madroño. Only six citations since 2006! But I still had fun and think we have the correct answer to a little puzzle.


  5. terry wheeler

    I found, to my utter lack of surprise, that most of my papers below the line were taxonomic, and most above the line were ecological, phylogenetic, or opinion pieces. But those undercited taxonomic papers will, in my opinion, be the most long-lasting and potentially most used, despite the perennial lack of citations taxonomic work receives. The first paper I published, from my undergrad project on ectoparasites of birds, is one of my 3 most cited. At the other end of the time line, 3 big multi-authored pieces from 2014 – on the importance of natural history, DNA barcoding in ecology, and specimen collecting – are shaping up to be way above the line based on early returns.


  6. sleather2012

    My most recent over-cited paper is Archetti, M., Döring, T.F., Hagen, S.B., Hughes, N.M., Leather, S.R., Lee, D.W., Lev-Yadun, S., Manetas, Y., Ougham, H.J., Schaberg, P.G., & Thomas, H. (2009) Unravelling the evolution of autumn colours: an interdisciplinary approach. Trends in Ecology & Evolution, 24, 166-173.
    with 53 cites to date since 2009 – I will, when I have the time, analyse the full data set as it will be useful for a blog post I have planned on papers that no-one cites even the authors!

    Liked by 1 person

  7. Jeremy Fox

    Afraid I don’t have any very interesting anecdata to report, but here they are anyway. My most over-cited paper by far is a group-authored review paper on interaction strength, Berlow et al. 1998, cited over 200 times on WoS IIRC. Fox 2005 is my second-most over-cited, because it proposes a modified version of a then- (and now-) standard analytical method in biodiversity-ecosystem function research.

    Overall, I’m not highly-cited. Which doesn’t bother me. My papers are like fine wine:


    Just kidding. Mostly. 🙂

    Liked by 1 person

      1. Jeremy Fox


        As my little joke at the end of my previous comment hints, I don’t ordinarily have any expectations as to how often my papers will be cited. Or if I do, the expectation is “not that often”, and is invariably borne out. Which again, is fine. If you’re anything more than kind of idly curious how often your papers will be cited, you may start trying to do science with an eye towards garnering more citations. Which is a recipe for bandwagon jumping and worse. It’s the same reason I don’t decide what blog posts to write by trying to guess what sort of posts might be popular.


  8. Jeremy Fox

    Re: influential (or at least popular) blog posts, over time Meg, Brian, and I have developed some ability to predict which of our posts will be most widely read, but it’s very rough and we’re often wrong.

    And there’s only a very rough correlation between “popularity of post” and “how good or important we personally think it is”. There are various reasons why the correlation isn’t tight. For instance, advice posts often are especially popular, but from our perspective they’re not that interesting to read or challenging to write.


  9. Pingback: What’s your most undercited paper? | Scientist Sees Squirrel

  10. Pingback: Are “side projects” self-indulgent? | Scientist Sees Squirrel

  11. Pingback: My most influential paper was a complete accident | Scientist Sees Squirrel

  12. crowther

    My most expectationally overcited paper: “Buffer optimization of thermal melt assays of Plasmodium proteins for detection of small-molecule ligands.” Published in 2009; 37 citations according to Google Scholar. The work was very solid … and very boring. Who but the nerdiest biochemist would get excited about buffer optimization?

    Liked by 1 person

  13. Pingback: We praise originality, but we don’t seem to value it | Scientist Sees Squirrel

  14. Pingback: A better way to rank the scientific literature – Brushing Up Science

  15. Pingback: 2020 was weird for blogging, too | Scientist Sees Squirrel

Comment on this post:

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.