Image: Citation impact vs. originality, for 55 of my own publications. See text for explanation.
Warning: a bit cynical.
Last week I filled out a grad-school recommendation form for a terrific undergraduate student. Among other things, it asked me to rate her “originality”. That got me thinking.
We tell each other often that we admire scientists who are original thinkers. Originality is often an explicit criterion in manuscript assessment, in tenure assessment, even at science fairs. The related idea of “novelty” is a major criterion in many (if not most) grant applications. Herman Melville might almost have been speaking for scientists when he said “It is better to fail in originality than to succeed in imitation*”.
So we praise originality. But do we value it? I’m skeptical. Science is a machine for discovering new things, and that seems intrinsically linked to originality – but as practiced, science often seems deeply conservative. This isn’t a new insight; it’s a big part of Thomas Kuhn’s model of scientific progress. But it occurred to me that I had at my fingertips an interesting way to assess how we actually value originality: the citation impact of my own scientific papers. So I asked myself this question: are my most original papers also my most highly cited? (I’m assuming here that citations are a plausible measure of how a field values a paper. I’ll return to that assumption later.)
Quantifying citations is pretty easy. I started with citation data from my Google Scholar profile, with a few minor edits**. I regressed citation counts against years since publication, since older papers will inevitably have accumulated more citations. The residuals from this regression are a measure of citation impact: a heavily cited paper will have a positive residual, and a poorly cited paper will have a negative one. (This, by the way, is the same methodology I used to identify my most overcited and most undercited papers, but I’ve updated the data.) I treated “empirical” papers (including observational, experimental, and theoretical work) separately from review papers throughout, as their citation impacts are quite different.
Quantifying originality is not quite as easy. In an effort to limit the amount of procrastination involved in writing this post, I ranked my papers on an originality scale from 1 to 11, subjectively but as fairly as I could***. A “1” is an utterly pedestrian paper, asking a familiar question and interpreting the data in familiar ways; perhaps it involves work in a different place or system from other studies, but it doesn’t break new intellectual ground. An “11”, in contrast, asks a question nobody has asked before, or interprets data in a qualitatively different way from the previous literature. (I’m most proud of my “11” papers, but I have a lot more “1” papers.)
Next I regressed citation impact on originality score. If we value originality, citation impact should increase with originality. It doesn’t:There is absolutely no hint here that we value originality! If anything, it’s the reverse: although neither slope is significant, both are negative. (Combining the two datasets makes the relationship nearly significant, P = 0.052, although I’m a bit worried about apples and oranges.) My more original papers are cited less. The data are noisy, which isn’t a surprise – I haven’t tried to take out differences in journal profile, narrow vs. broad questions, theory vs. experiment, and so on.
I’m not surprised by this result, because it reinforces my preconceptions: that it’s easy to publish unsurprising, standard-approach work, and hard to publish anything outside the box****. I realize this sounds like sour grapes (more about that soon), but I think it’s more than that. When you do something really original, it doesn’t fit with prior art in the field, and there isn’t a tidy way for people to cite it – even if they want to, and they may well not want to. Every field has a set of accepted ways to ask accepted questions. Papers that fit the mould slide through peer review and fit nicely into the citation lists of all the other papers asking the same accepted questions. Take stream ecology, for example – there must be a bajillion papers putting leaves into litterbags and comparing decomposition rates among leaf species, water temperatures, flow rates, and so on. (One of that bajillion is mine). Or in plant ecology, there are a bajillion papers exposing seeds or seedlings to some kind of stress and measuring germination and growth (a couple of this bajillion are mine, too). You’ll have your own favourite examples.
Now, precisely because the result reinforces my preconceptions, I’m cautious about it. Four reasons (and you may suggest more in the Replies):
- I scored originality in my own papers, and because I remember at least vaguely which papers of mine get cited, I couldn’t blind myself. It’s possible that I’m just annoyed that any of my papers could have low citation rates, and that I’d rather think “Oh, that’s because my approach is ahead of its time” than “Well, I guess I wrote a lemon”.
- Perhaps more original papers do have more influence, but it takes longer for that influence to be felt. Maybe original papers tend to be “Sleeping Beauties”, recognized and valued only long after publication. Mind you, as I get older, it gets harder to see this as plausible for my papers!
- Perhaps more original papers are valued more, but that value is manifested in other ways than citations: inspiration of subsequent research, honours and awards, and so on. Although I haven’t been raking in those kinds of value markers either.
- Perhaps citations are just a poor measure of value because they often serve other functions – they may be “throwaways” that check a citation box but don’t really constitute recognition or involve building on the cited study. Actually, I’m sure some citations are throwaways, although I think it’s hard to argue that citation counts don’t mean anything.
Or maybe Thomas Carlyle was right: “Originality is a thing we constantly clamour for, and constantly quarrel with”.
© Stephen Heard (firstname.lastname@example.org) May 10, 2017
*^At least, the internet says he said it.
**^I omitted 5 publications that are way outside my own field, because their citation data wouldn’t be remotely comparable. These were three papers on sequence diversity in T-cell responses to viral epitopes, and two on the design of serine protease inhibitors. I also omitted my 2016 and 2017 papers, which are too young to have a meaningful citation record.