Overview

Although we spend a lot of time working with data interactively, this sort of hands-on babysitting is not always appropriate. We have a philosophy of “source is real” in this class and that philosophy can be implemented on a grander scale. Just as we save R code in a script so we can replay analytical steps, we can also record how a series of scripts and commands work together to produce a set of analytical results. This is what we mean by automating data analysis or building an analytical pipeline.

  • Chapter 33 - Why and how we automate data analyses + examples
  • Chapter 34 - make: special considerations for Windows
    • 2015-11-17 NOTE: since we have already set up a build environment for R packages, it is my hope that everyone has make. These instructions were from 2014, when we did everything in a different order. Cross your fingers and ignore!
    • (If you are running macOS or Linux, make should already be installed.)
  • Chapter 35 - Test drive make and RStudio
    • Walk before you run! Prove that make is actually installed and that it can be found and executed from the shell and from RStudio. It is also important to tell RStudio to NOT substitute spaces for tabs when editing a Makefile (applies to any text editor).
  • Chapter 36 - Hands-on activity
    • This fully developed example shows you:
      • How to run an R script non-interactively
      • How to use make
        • To record which files are inputs vs intermediates vs outputs
        • To capture how scripts and commands convert inputs to outputs
        • To re-run parts of an analysis that are out-of-date
      • The intersection of R and make, i.e. how to…
        • Run snippets of R code
        • Run an entire R script
        • Render an R Markdown document (or R script)
      • The interface between RStudio and make
      • How to use make from the shell
      • How Git facilitates the process of building a pipeline
    • 2015-11-19 Andrew MacDonald translated the above into a pipeline for the remake package from Rich Fitzjohn: see this gist.
  • Chapter 37 - Three more toy pipelines, using the Lord of the Rings data

Resources