Impact factors* are getting lots of use, and (perhaps as a direct result) it’s fashionable to argue that this use should be abhorred. Some days it seems like the impact factor can join the P value, the lecture, the paywalled journal, and bellbottom jeans in the lineup of innovations widely claimed to be obsolete and, perhaps, to have been bad ideas in the first place. And yet, just last week I was talking with a collaborator about where to send a manuscript, and when she mentioned a journal I didn’t know, my first question was “What’s its impact factor?” So: am I guilty of perpetuating the horror that is the impact factor, or was my question a reasonable one?
There are three anti-impact-factor positions one can take, and many people either ignore this, or conflate them First, one can argue that impact factor is a poor measure of paper quality. Second, one can argue that impact factor is a poor measure of journal quality. Third, one can argue that we needn’t measure journal quality at all. All three** arguments are in wide circulation.
(1) Is impact factor a good measure of paper quality? Of course not, and thinking it might be is surely the most egregious misuse of the impact factor. It happens, though. Maybe you’ve sat on a hiring or tenure committee and heard somebody argue that paper X should be heavily weighted because it appeared in Nature, or Science, or some other journal with a high impact factor. This is ridiculous, of course; some legendarily crappy science appears in Science and Nature, and so does a lot of science that’s flashy but unimportant. (Retraction rates even correlate positively with impact factors). Conversely, many influential papers appear in journals with lower impact factors. But complaining that journal impact factor is a poor measure of paper quality is like complaining that a letter-opener does a poor job of crushing garlic. People who make this complaint should grab a better tool (I’m intrigued by the newly proposed relative citation ratio) and stop making themselves look silly***.
(2) Is impact factor a good measure of journal quality? There are many ways in which impact factor is an imperfect measure. Its time horizon is short, which favours fields that move quickly, and papers that are written for immediate splash rather than lasting value. It doesn’t distinguish positive citations from negative ones. It’s a mean rather than a median, and so heavily influenced by a few outlying papers. On top of these intrinsic problems, publishers game impact factors (for instance, by publishing more reviews, or by encouraging within-journal citation). Of course, this gaming shouldn’t surprise us, because gaming is an inevitable feature of quality-signalling systems. Book publishers game best-seller lists, TV networks game Neilsen ratings, and in behavioural ecology (and bars) both males and females game the signals used for mate choice. It’s worth working to make the signals harder to game – but in the meanwhile, we generally don’t abandon imperfect signaling systems, because they still carry useful information. Impact factor seems no different to me. So let’s remember that this isn’t much use decrying their imperfection unless we can suggest an alternative metric that does better (this suggestion is intriguing, although I’m skeptical).
(3) Should we measure journal quality at all? Perhaps the bravest argument is that we should simply stop trying to deploy metrics of journal quality. According to this argument, impact factors are designed to serve the interests of publishers, while the unit that’s important to everyone else is not the journal but the paper. There is some truth to each half of this: publishers aren’t directly interested in the quality of individual papers (only in their aggregate quality), while in assessments such as tenure and granting, we should obviously evaluate quality of individual contributions, not the venues in which they appeared. And yet there are two reasons I find impact factor helpful – one for reading, and one for writing. On the reading side: there’s a lot to read out there, and knowing that certain journals have high editorial standards and tend to print papers that will be interesting and important (as judged by others’ citation of them) is a useful shortcut to prioritizing my reading time. It’s useful to scan tables of contents for the top-impact journals, while covering lower-impact outlets by topic-focused search (for instance, via Google Scholar alerts). On the writing side: when I’m considering a journal to publish in, impact factor is a help. Sometimes I want to aim high, and sometimes I know I’ve written something that ought to be available but that isn’t going to galvanize my field. Knowing a journal’s impact factor (along with its scope, of course, and compared only within fields) is a pretty good first cut in thinking about whether it’s a good home for a particular manuscript. (There’s a related point here about whether we need journals with reputations at all, but that will have to be a future post).
So I’m not ready to abandon the journal impact factor, or efforts to measure journal quality more generally. Like every tool, it can be abused. Like most tools, used as designed, it can do a useful job for us.
© Stephen Heard (email@example.com) November 12, 2015
*In case you haven’t been paying attention: for a given year, a journal’s impact factor is the mean number of citations to papers published in that journal in the previous two years. This is presumed say something about the average impact of papers in that journal, and in turn, about the journal’s “importance” or “quality”.
**Yes, the logical structure here suggests a fourth argument: that we need not measure paper quality at all. Including this would make for some nice symmetry – but I hope nobody actually believes that we shouldn’t assess the quality of papers!
***Impact factor might, however, usefully measure the quality of sufficiently large sets of papers. When that set is “all papers in journal X”, of course, that’s just impact factor used as designed; see arguments (2) and (3). But what if that set is “15 papers on a CV”? Well, if I have two job candidates, and one has 15 papers in the American Naturalist (JIF = 4.5) and one has 15 papers in the Canadian Entomologist (JIF = 0.8), then I may not need to read every paper to make my comparison. (Cue rage in Replies….. so I should point out that this doesn’t mean there’s anything wrong with the Canadian Entomologist; I’ve published there and will do so again.)
This came out a few days ago about the effect impact factors have on researcher’s “reward center” in the brain, really funny to see! http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0142537
I really don’t follow your first argument. If I am complaining (and I do, frequently) about someone’s use of IF as a measure of paper (and hence researcher) quality, it is not I that am being made to look silly, but the user of the fundamentally flawed metric (for the purpose to which they are putting it). I can point to them to better metrics, some less flawed than others, some just as susceptible to gamification. Or they could just read the paper and make an informed, scientific assessment…
What you appear to be suggesting is that one is silly to point out the misuse of a poor tool, not the silliness of the use itself?
I feel that the first question that should be asked is; who is your audience? Then go from there.
Gavin – you’re quite right, I phrased that poorly. What I mean is that if you write a blog post explaining all the ways that IF doesn’t measure paper quality, and trash IF on those grounds, then YOU look silly. If you point out that someone else has used IF that way, and it isn’t for that, of course I agree with you! Does that help you make sense of my argument #1?
LikeLiked by 1 person
Right; I’m with you — criticising a tool for not doing well something it wasn’t intended to be used for is silly. I would draw a distinction though; I wouldn’t consider someone silly that wrote about (or lectured to grad students on[*]) why IF shouldn’t be used to measure research impact. There are many people around that don’t realise how misguided it is to use IF as a measure for paper or researcher impact so educating those people is to our mutual benefit[**].
[*] otherwise I’d have to call myself silly, because I did just that only the other week. There, our new grad students were unaware of the arguments surrounding IF and research assessment in general. So that gets me off the hook I think 🙂
[**] Not that we need more of those blog posts; others have written well on this subject many time before.
Good, agreed – what irks me is people lecturing on why IF isn’t a good metric of paper impact _without realizing_ that isn’t its intended purpose. There are plenty of those!
LikeLiked by 1 person
There’s a paper by Stringer et al. (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0001683) that might be of interest to you, especially fig. 3. They quantify the classification error for ranking papers within fields based on impact factor alone. The quality metric they used was final citation count. For example, for journals with similar IFs papers are correctly classified under 50% of the time, because journal citation distributions are skewed.
I’ve still just a master’s student, so I don’t know much about the academic recruiting process. But I’d think that in many cases the shortlisted applicants have similar publication histories in terms of journal IFs, but the medians and shapes of those journals’ citation distributions might differ much more drastically. So I’m inclined to believe that JIF isn’t a good tool even when comparing large sets of papers.
Stringer et al. published their parameter estimates for a whole bunch of journals’ citation distributions along their paper (in a PDF, but luckily there are tools for mining those), and I’m actually trying to quantify the error rate of mean-based metrics in comparing publication histories. If there’s someone here who’d like to work the example with me, I’d be happy to put the mined data on github (message me on twitter: @koalha).
Konsta – it would be really interesting to see the results for an analysis like that. The central limit theorem probably means that skewness etc. of the distribution doesn’t matter much, for sufficiently large paper sets; and the dependence of standard error on 1/n should take care of error variance. “Sufficiently large”, however, might be much larger than the list on a typical CV, if my intuition is wrong!
Completely agree that IF is a tool that can be misused as well as used incorrectly; I suspect the latter occurs because some people don’t really understand what it is and what it is not. In part this is because the term “impact” is used in different ways for different audiences, e.g. societal impact of research versus academic impact.
With regard to: “one can argue that we needn’t measure journal quality at all”
In the recent UK Research Excellence Framework (REF) the panels assessing research outputs were told explicitly not to take the identity of the journal into account when assessing the quality of the outputs. Of course no one believed that this instruction would be followed, and that even if it was, subconsciously panel members would be making judgements. And in their final over view reports panels did provide details of the most commonly submitted journals, with Nature, Science and PNAS featuring heavily.
One minor point: when you say IF is “an average rather than a median”, shouldn’t this be “mean rather than median” as both are measures of the “average”.
re: mean/average: quite right, Jeff, and I’ve edited the post accordingly. Thanks!
LikeLiked by 1 person
Pingback: Recommended reads #64 | Small Pond Science
If you are going to use the JIF for anything, you may be interested in Steve Royle’s analysis: https://quantixed.wordpress.com/2015/05/05/wrong-number-a-closer-look-at-impact-factors/
LikeLiked by 1 person
That is indeed a very useful post. It makes my argument #1 very persuasively and in detail. It also does a nuanced job of my argument #2, finding that JIF does in fact have value as an indicator of overall journal quality, although the signal is noisy as expected – perhaps noisier, even, than one might have expected. (I would be surprised if people who already oppose use of JIF for any purpose actually extracted that message from the post though; you have to read with some care). Thanks for commenting!
Pingback: Why fit is more important than impact factor in choosing a journal to submit to | Dynamic Ecology
Pingback: Why I still list (and pay attention to) journal names | Scientist Sees Squirrel