Am I the last holdout against making (publication-quality) figures in R?

It’s been amazing, over the last decade, to watch the incoming tide of R swamp every other tool for statistical analysis (at least in my own field, ecology and evolution). I’ve mostly come to accept my new statistical overlord*.  But what I don’t understand is R graphics.

R culture is coding culture (as was the SAS culture I cut my teeth on). And that’s great.  I want my analyses to run using code, because I can run them over again, either with old datasets or new ones.  It’s a great feeling to dig out an old dataset and an old piece of code, and Bob’s-your-uncle there’s the same old result as a jumping-off point for the next step. No reinventing the wheel! But figures?  Why would I code for figures?  For figures, I want to point and click**.

My grad students are all about figures in R.  And my Twitter timeline is full of snarky comments about the superiority of R for figures. Some poke fun at the ugliness of Excel defaults – as if that were somehow relevant to something, and as if Excel was the only other graphics program they’ve heard of, and as if R’s defaults weren’t equally ugly.  Others take a different tack: they point to the reproducibility virtue of coding. But why does the production of a figure need to be reproducible?  Figures and analyses aren’t the same kind of thing.  An analysis is a decision-making, hypothesis-testing instrument, and it absolutely makes sense for my analysis to be reproducible (by me or someone else).  A figure, though, is a storytelling device (“data visualization” is the trendy term). A figure is more like a turn of phrase or the organization of a paragraph: it’s an aid to understanding for the reader.  What matters is only how one can best produce a figure that’s easy to understand – which means being able to make different kinds of plots quickly and easily, tweak the graphic design, and so on. It’s certainly possible to do this sort of thing via code, but I’ve seen no evidence that it’s easier or faster*** (and plenty of evidence in the other direction).

So I can’t figure out what advantage coding has for making figures, and I can see lots of costs.  Now, when everyone is doing thing X and I’m not, I usually assume that I’m missing something.  And I’m sure you’ll tell me so in the Replies.  But the thing is, I’m skeptical in this case – because I know one thing: professional graphic designers don’t use code.  Those whose careers focus on building effective communication through graphics use point-and-click software written expressly for the manipulation of visual elements.  I do too.  (EDIT:) Several folks have told me I’m wrong (see masthead) about that crossed-out bit.  It’s good to have (constructively) critical readers!  It is true (and I glossed over it) that production of dynamically-updated graphics served via the web depends on coding – it probably has to.  What surprises me is to be  told that many professionals write code to produce static, one-time graphics.  I’m going to assume my commenters are right about this, in which case times have certainly changed.  You might think this new knowledge would convert me, but I’m afraid that so far it only mystifies me more deeply.  I’m perhaps a bit too reassured to learn that a lot of people produce graphics using code (R or otherwise) – but then tweak or polish them in a specialized program like Adobe Illustrator.  Perhaps the best message is this: use the tools that work for you.  For me, that’s not coding – at least, not for figures. (END EDIT)

© Stephen Heard (sheard@unb.ca) May 4, 2017


*^A bit grudgingly, perhaps. I’d be more likely to throw down rose petals if it wasn’t a language with such dreadfully constructed syntax (heresy!), didn’t have such an enormously steep learning curve, didn’t have astonishingly cryptic error messages and unreadable documentation, and didn’t leave so much uncertainty about how we can know its results are actually correct. But all that is a topic for another day.

**^There are lots of point-and-click options. I happen to use Sigmaplot for graphs, PowerPoint for line art, and every now and again a speciality program for something it does well (like TreeView for phylogenetic trees).

***^With the admitted exception  that while coding slows you down immensely in making the first figure of a particular sort, you can make that up if you’re making many figures with different data but the same template.  But here’s the thing: you mostly shouldn’t be doing that.  It’s become common to harass your reader with online supplements containing figure after figure after figure, and to make each one hugely complex with dozens of panels and traces. But this is just an abdication of the writer’s responsibility to find and tell a story.

Advertisements

32 thoughts on “Am I the last holdout against making (publication-quality) figures in R?

  1. Elina Mäntylä

    I know that R is probably the thing everyone will use in the future, at least for statistical analyses. But I was taught to use SAS for analyses and Sigmaplot for figures, and I want to continue using those as long as I can. And I will learn to use more R, too. 🙂

    Like

    Reply
  2. ergative absolutive

    You’re talking about figures as a publication tool, but they’re also a primary analysis tool. Do you really interpret your three-way interactions between numerical variables with polynomial transformations directly from your regression model? (If so, I’m very impressed!) Or do you make a partial-effects plot and take a look at the pictures? What about making graphs of your raw data, grouping by categorical variables, looking at the effect of multiple variables on your outcome? Sure, other programs can do that, but it’s a lot easier to just re-run the code eight and a half times, switching one variable each time, and then you can be sure you understand the various relationships. And it’s so nicely integrated with your primary analysis software! Why do a task in two programs when you can do it in one?

    The key beauty of R, and ggplot2 especially, is that once you’ve made all the plots to look at your data, they’re high-enough quality to export directly into a publication. Professional graphics types don’t use code, but we’re not graphics designers. We want something good enough, and in my case I’ve almost always already made something good enough just as part of my data analysis task, so using a separate graphics program ends up being more work in the end.

    Now, for figures that do NOT involve talking directly to my existing data, then I don’t touch R. Actually, now that I think about it, I use tikz and pgf in latex, which are also code-based. Hmmm . . .

    Liked by 2 people

    Reply
      1. Karl Cottenie

        I think you would avoid a lot of confusion if you update the title of your post to “Am I the last holdout against making publication-quality figures in R”. Based on your comment, I think that is what you intended, while your text, and a lot of comments, do not distinguish between exploratory figures and final figures that are part of the story you want to tell in your publication.

        Once you made that distinction, I think nobody will argue with coding exploratory figures (http://r4ds.had.co.nz/), while coding vs hand massaging of publication figures depends on a lot of things that are often not under your “control”: author guidelines, what components in the figure you want to stress, how many components you need to add to your figure (the Science/Nature style of figures), how comfortable you are with coding, how militant you are with reproducibility, how much funding you have, …

        Like

        Reply
      2. Brian Cade

        I learned statistical graphics from Leland Wilkinson and Systat. But my current approach that works well is to implement the necessary statistical graphic in R (e.g., partial regression coefficient effect plot made while holding other predictors fixed at some value), copy it from R as a windows metafile format into PowerPoint, and then use Power Points flexibility for annotating labels, adding necessary text, changing colors, symbol shapes/sizes, etc. I can then easily use the same base statistical graph to make slides for use in a talk or for a publication. For publication quality, I convert the Power Point to pdf where I can use Adobe to export to any format (tiff, png, pdf) typically required by journals. Now, conceptually, all the editing of the figure I’m doing in Power Point could be done in R. It is just easier to do and remember all the options in Power Point.

        Like

        Reply
  3. Pratik Gupte

    I’m a grad student, and like yours, also crazy about figures in R. I think R figures go hand in hand with the rest of R culture, if such a thing could be said to exist. I think that all work needs to be reproducible from the point I get the data (and ideally sampling and experiments too), to the analyses, to the figures I use to explain the results. Updating the figure after new analyses or new data is easy. The figure must change because the data it shows have changed, not because one has forgotten a click here or there, or because image dependencies have a different filepath. Same goes for presentations in R.
    Maybe we’ve all been drinking the RStudio kool-aid, but it’s good so far.

    Like

    Reply
    1. ScientistSeesSquirrel Post author

      You say “all work needs to be reproducible…to the figures I use to explain the results”. Yes – I’ve heard a lot of people say that. What I don’t understand (and this is a genuine question) is why? I understand the benefits of reproducibility for analyses, but not why it extends to figures. (Apart from the “updating” argument, which I do see, although for me by the time I’m producing publication-quality figures, I’d better not be updating the data any more!)

      Like

      Reply
      1. Pavel Dodonov

        Well… It’s always possible there was some mistake in making the figure. A wrong column selected or something like this. If the code and data are made available, it’s easier to check this. So maybe it’s not so much reproducibility as checking for errors?

        Like

        Reply
  4. Ian Medeiros

    Does anyone else like making figures with Python, or am I the odd one out? I’ve had limited experience making figures with R, but they were definitely not as visually attractive (or as easy to modify) as the ones from Python.

    Like

    Reply
  5. Anthony L Einfeldt

    There’s a couple assumptions here that might be true for particular types of figures (summarizing particular types of data), but not for others:

    1) The storytelling utility of a figure is only for the reader’s benefit, as the researcher already understands the results.
    2) The figure is of sufficient simplicity that it can be accurately created manually.

    Figures can be useful to explore data before deciding to settle on a single graphic to show the readers that best demonstrates the story. Consider spatial or genetic data, where particular sets of points may be more interesting than others, but all possible sets must be carefully scrutinized first. For example, this image (http://nycevolution.org/research/urban-landscape-genetics-of-white-footed-mice/struct-bar-plot/) effectively demonstrates differentiation between different populations, but the authors likely (definitely) viewed many other similar figures varying the markers used in the analyses and the number of groups to assign individuals to. How those changes impact the numeric data that the figure is made from would be extremely difficult to comprehend by looking at the numbers, but exploring the data visually is intuitive and helps the researcher in developing their story and choosing a final demonstrative image to present.

    As for the second assumption, I’m sure someone could manually make a similar plot manually that was much more appealing. However, the more graphical elements in the plot, the higher the probability of making mistakes (or letting personal biases manifest, even unintentionally). Arguably in most cases small mistakes wouldn’t change to the overall message of a figure, but that is not necessarily always true, and I think we should generally strive to convey as correct information as possible.

    Like

    Reply
  6. Sara Edwards

    Ok, you have officially gotten Tony and I in a tizzy we have soooo much to say and are currently saying it all behind your back…lol
    A few points:
    1) R is free (Sigma Plot is $600-$900)
    2) Learning to plot in R is a great way to build coding skills and practice coding language (developing these skills helps with everything else you’ll need to do in R )
    3) R markdown (or notebooks) are a great way to share (and store) results, keeping plots and stats together in one nice document

    and side note…I used sigma plot and had a heck of a time making a very simple figures for my first publication (output image did not match the plot I made, it had weird shading for no particular reason), R was straightforward and much easier (and I was a novice coder at the time)

    Liked by 2 people

    Reply
    1. KTInvasion

      I agree! Let me extend:

      1a) Programs you have to pay for are fine if you are confident that you’ll have the funding to cover it for the rest of your career (I wouldn’t say that’s a *safe* bet for a lot of people) or in what university you’ll be at for the rest of your career (site licenses are great if you can get them).
      2a) Learning how to code (in anything), and the more the better, builds a valuable transferable skill for when Academia fails you and you have to pay your rent in some way other than making slick figures for science publications.

      IMHO, point-and-click programs that cost money are a poor long term choice for students and ERCs. Other free, code-using, strategies (such as Python), for figures or anything else seem totally fine to me. I very much wish I could get back the time I spent using JMP as a post-bac/early grad student.

      Like

      Reply
  7. Dimitris Kontopoulos (@DGKontopoulos)

    I think it depends on how you generate the data points for your figure(s). If the data points are, say, field measurements that are not likely to change, you are ok by doing the plot in a manual way. Mostly because you are generating the plot once or twice.

    If, however, the data points are, say, model outputs or simulations, it would be in my opinion preferable to generate plots in an automated way. In this way, you can easily check what happens when the structure of the model changes (e.g., during sensitivity analysis or because a reviewer was curious about something). Figures can sometimes get so complicated that even writing the same code again seems quite a task to me, let alone doing that by hand.

    Like

    Reply
  8. Christopher Moore (@lifedispersing)

    It seems to me like much of this is more philosophical, and doesn’t necessarily boil down to arguments like the amount of time to make a figure. I think that there is a level of elegance in creating a scientific product that *can* be reproduced and is documented, whether or not one needs to reproduce it for another or themselves.
    I personally value the ability to document the means by which I produced a figure, which is why I find coded figures vastly superior. In my training in R graphics I often build upon my own work, which means that I can exactly and precisely know what I did to produce that graph.
    Maybe most importantly to me, the limitations of R is up to the user. If there are ways we wish to visually communicate our project, our only limitation is our imagination. I don’t believe the same degree of freedom exists in point-and-click applications. For instance, we had an idea to create a 2-panel plot of a time series and density, with the latter flipped horizontally to coincide with the time series (e.g., https://twitter.com/lifedispersing/status/630064462083588096) that we can now use over and over for our simulations. As another example, I recently used a new package with a graph that another had imagined that presents a surface with a contour at the bottom (https://twitter.com/lifedispersing/status/781539471686852608)—do be able to imagine that create that for those of us who extensively use graphical analysis and presentation is invaluable.
    I mostly agree with your post about how reproducing figures does not matter matter, it’s costly to learn, etc. What I listed above is some of the reasons why I prefer and value command-line figure making. Thanks for the stimulating post.

    Like

    Reply
  9. Pavel Dodonov

    You’re probably not the last holdout, but I don’t know many others! 😛
    I like coding figures becase: 1) it makes me think better about what exactly I want to show; 2) if I have to make some change, e.g. make some symbol a bit larger, I just have to rerun the code, and not go through a long series of pointing and clicking.
    And it’s fun coding! Isn’t this one of the reasons we do Science – because it’s fun?
    But I totally agree that point and click software is great, and I don’t think that using R makes anyone better than the next guy (or gal). However, I do find there’s a lack of good free software to make plots, and I think the free software is really a good thing, especially for science for reasons of reproducibility. (Plus, we who work and study in developing country don’t have money to buy SAS or Sigma Plot or other software. Neither do our universities!)
    Another advantage of R is the great flexibility it offers – you can make basically any sort of plot. And you can make this code available for all to use to make their own plots. It’s easier sharing code than sharing point and click instructions!
    Before I learned R, I made my plots in Origin. It was nice. But nowadays I’d no longer have the patience to insert the data, select, point, click… Coding is more fun. 🙂
    So I guess I’m trying to say basically that: – if you like coding, R is the place for you to make figures (but there are coding possibilities in other software, so it’s not the only option!); – if you don’t like coding, I don’t see any reason to make figures in R unless there’s no other software available to make the figure you need; – if you’re into free software; R is a great option; – but if you’re into free sources but not into coding, there are options like Past, Bioestat and Calc combined with Inkscape to make the figures prettier.
    By the way, I don’t think R has a bad syntax. There could be more standardization, but overall I find it quite easy to use once you get used to it. Then again, I learned programming in R, so I’m probably biased.
    Sorry for the confusing comment. I’m having a confused day. 🙂

    Liked by 1 person

    Reply
  10. Bob Montgomerie

    Your points, as well as those in the comments so far, are good ones. And there is obviously a lot of love for R as a graph drawing tool in your commenters.

    I want to add here a slightly different perspective from a heavy R-user who has taught both grad and undergrad courses on using R for the past 15 years. I consider myself to be very R-proficient but I do not/would not use R for drawing graphs for publication for the simple reason that it takes too long.

    Certainly some time can be saved by using R notebooks (in R Studio) and grabbing code and ideas from sites like http://ow.ly/vnqv30bqDZC and http://ow.ly/4UBc30bqEgJ but if you draw a lot of graphs, R is just too time consuming. I draw dozens-to-hundreds of graphs a day on average and much prefer JASP (https://jasp-stats.org), jamovi (https://www.jamovi.org), JMP, and Wizard (http://www.wizardmac.com) for quick graphical data exploration, and DataGraph (http://www.visualdatatools.com/DataGraph/) for drawing graphs for publication. jamovi is still in development but even now it is very useable for simple exploratory analyses.

    My graduate students and colleagues often take hours to draw a publication-quality graph in R that I can draw in seconds-to-minutes with DataGraph. And yes, learning to code etc are al valuable but not as valuable as my time. Basic graphs in R are fine for reproducible code and I use/publish those all the time but for tweaking options to make an excellent graph, or getting many graphs during exploratory analyses, other tools are better IMO.

    Like

    Reply
      1. Bob Montgomerie

        Yes both JASP and jamovi are free and platform independent. jamovi also runs from within R (imv package) and shows you R-script for all analyses within the program (albeit using imv functions). Wizard is also free in the trial version for unlimited time and is very good even as a trial version.

        Liked by 1 person

        Reply
  11. Sam

    I’m a grad student who uses R to make figures very, very slowly (of course, easy figures like a bivariate scatterplot are very simple to make, and make look good!). Especially multipanel or complicated figures, which might take many dozens of lines of code and nested for loops and all sorts of nonsense. I hadn’t really though of why I do this, but I can come up with a few reasons.

    1) It keeps everything in one environment. I can open my script, run it, and take raw data and turn it into pretty pictures. I know that there aren’t any mistakes in the clicking and dragging to make the figures (maybe still mistakes in the code though!). I can take my personal-consumption exploratory plots and incrementally turn them into something prettier. If I change something, I can just re-run everything without wasting time remaking the figures in JMP or SigmaPlot or whatever. I am certain that the time it would take to remake the figures is way less than it took to originally code the figures in R, so this is not a great argument.
    2) or 1a? Code is portable. Once you’ve written a plotting function to do something nicely, the sunk cost fallacy just begs you to try and re-use that code for the next analysis.
    3) Excel makes ugly figures. So does ggplot2, as far as I can tell. I’m irrationally resentful when I see a default-style ggplot figure in a publication. Base R makes the ugliest figures, and takes hours to turn them into something pretty (though less time now that I’ve built a library of stylish plotting functions). I wonder how much of making figures in R is a masochistic exercise to prove it can be done.
    4) Everyone else uses R, so if I need help, it’s probably going to be another grad student sharing their R code. I can also help others in the same language. My advisor uses JMP and SigmaPlot and I don’t know how to use that stuff.

    So, in short, I think it’s a weird mixture of laziness, masochism, community, and arrogance that leads me to use R to make figures.

    Liked by 1 person

    Reply
    1. ScientistSeesSquirrel Post author

      You’re right about defaults. And for completeness – I admit they’re kind of ugly in SigmaPlot too, albeit less so. In most programs you can change the defaults, and in *all* you can adjust them, of course, as you do.

      You get lots of points for referring to the sunk-cost fallacy (also known as the Concorde fallacy, for those of us old enough to remember the Concorde). One can know perfectly well one is committing it, and still be unable to stop. At least, I can!

      Like

      Reply
  12. Timothée Poisot

    I like R figures to make quick a dirty plots just to see what is going on, or to rapidly see how new data will change the figure. Depending on the paper, I either use this for publication, or generate something with pgfplots (which is based on LaTeX), or sometimes gnuplot.

    Design (figures included) is a creative exercise. If you can “paint with code”, then R may be the right tool. If you can’t, then grace (or sigmaplot) are right. What matters, as you say, is that the plot can be read and understood. The way to produce is not really relevant.

    Except ggplot. Don’t use ggplot2 in a publication.

    Liked by 1 person

    Reply
        1. ScientistSeesSquirrel Post author

          Precisely! Regardless of what package you use, accepting defaults blindly is a bad idea. R defaults are icky, Excel defaults are icky, SigmaPlot defaults are (moderately) icky. And I use “icky” in its most technical sense 🙂

          Like

          Reply
          1. Gavin Simpson

            R doesn’t have a default; it has at least three broad plotting paradigms and at least two of them were developed from research by scientists into how data should be graphed (lattice from Bill Cleveland’s work and Trellis, and ggplot2 from Leland Wilkison’s work on the grammar of graphics), which suggests at least some thought went into this.

            What you are describing as icky is just your opinionated aesthetic. I dislike SigmaPlot figures, I dislike the aesthetic of lattice/trellis plots. I quite like the ggplot default but I mostly use theme_bw() these days and I fundamentally hate what cowplot uses as it’s theme. We all have those opinions as to what we like best and no one opinion is right.

            Theming; is an important aesthetic, but it is just a veneer on top. Themes should be easily changed, but good defaults for thinking about how data should be represented on plot (not what those things look like) are far more important. There are good principles for plotting data that we can agree upon or discussion. Going on about default themes/styles is a distraction.

            Like

            Reply
  13. Manu Saunders

    Everyone has their favourite programs, depending on their skills and expertise. I think it’s more important that figures are accurate, appropriate and easy to read…what program made them or whether code is available to reproduce it is irrelevant!
    I’ve only just learned how to do CI and effect plots for mixed models in R, mostly because it’s the quickest way I can find to do them. But for basic graphs and conceptual diagrams, all my published figures were done in Excel, Powerpoint or the free stats program PAST, and I still use those programs for most figs.

    Liked by 1 person

    Reply
  14. Richard Anderson

    I think I’m very much of an outlier when it comes to R. I hate it, and have only used in when necessary–such as when writing and running particular statistical simulations. I use a few different scripting languages (javascript, visual basic for application, python) and occasionally spss command syntax. But learning more languages tends to create destructive interference, for me, thus increasing the likelihood of me writing incorrect code. My bet is that R analyses are no more reproducible than are SPSS analyses: SPSS analyses can be run via code, as can R, and R analyses can be not-very-reproducible since there is often considerable pre-processing of data, outside of R. Ideally, an analysis platform will consist of a gui that automatically generates code that can be re-run. SPSS does this (though far from perfectly). And the new, open access, “jamovi” point-and-click application automatically generates R code that can be re-run in a R console.

    I thin that reason R exists principally as a language is that, from a programmers perspective, it’s usually just too much work to produce a highly usable GUI for free. (And yes, I realize that there are GUI packages within R, but their usability is rather limited, I think).

    So back to the main point of this thread, I use SPSS for graphs (ad sometimes have to resort to coding to tweak them), but I’ve been actively looking for alternatives. jamovi might become that alternative once the GUI begins to incorporate graph editing capabilities.

    Like

    Reply

Comment on this post:

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s