Due Friday 2017-November-14.

## Big picture

(We started this in class during the lecture.)

• Write (or extract from a previous analysis) three or more R scripts to carry out a small data analysis.
• The output of the first script must be the input of the second, and so on.
• Something like this:
• Second script: read the data, perform some analysis and write numerical data to file in CSV or TSV format.
• Third script: read the output of the second script, generate some figures and save them to files.
• Fourth script: an Rmd, actually, that presents original data, the statistical summaries, and/or the figures in a little report.
• A fifth script to rule them all, i.e. to run the others in sequence.

You can use Make, Remake or (if you feel the urge for something more) a combinantion of Make and Remake.

## Templates you can follow

• A bare bones example which uses only R: 01_justR
• An example that also uses Make to run the pipeline: 02_rAndMake
• An example that runs and R script and a renders an Rmarkdown file to HTML using Make: 03_knitWithoutRStudio

## More detailed instructions (optional)

If you don’t feel like dreaming up your own thing, here’s a Gapminder blueprint that is a minimal but respectable way to complete the assignment. You are welcome to remix R code already written by someone in this class, but do credit/link appropriately, e.g. in comments.

Jennifer Bryan has provided a template, using a different dataset, 01_justR, that should help make this concrete.

download.file("https://raw.githubusercontent.com/jennybc/gapminder/master/inst/gapminder.tsv", destfile="gapminder.tsv")
• Option 2: in a shell script using curl or wget.

curl -o gapminder.tsv https://raw.githubusercontent.com/jennybc/gapminder/master/inst/gapminder.tsv
wget https://raw.githubusercontent.com/jennybc/gapminder/master/inst/gapminder.tsv

### Perform exploratory analyses

• Bring the data in as data frame.
• Save a couple descriptive plots to file with highly informative names.
• Reorder the continents based on life expectancy. You decide the details.
• Sort the actual data in a deliberate fashion. You decide the details, but this should at least implement your new continent ordering.
• Write the Gapminder data to file(s), for immediate and future reuse.

### Perform statistical analyses

• Import the data created in the first script.
• Make sure your new continent order is still in force. You decide the details.
• Fit a linear regression of life expectancy on year within each country. Write the estimated intercepts, slopes, and residual error variance (or sd) to file. The R package broom may be useful here.
• Find the 3 or 4 “worst” and “best” countries for each continent. You decide the details.

### Generate figures

Create a figure for each continent, and write one file per continent, with an informative name. The figure should give scatterplots of life expectancy vs. year, faceting on country, fitted line overlaid.

### Automate the pipeline

Write a master R script that simply source()s the three scripts, one after the other. Tip: you may want a second “clean up / reset” script that deletes all the output your scripts leave behind, so you can easily test and refine your strategy, i.e. without repeatedly deleting stuff “by hand”. You can run the master script or the cleaning script from a shell with Rscript.

Render your RMarkdown report generating Markdown and HTML using rmarkdown::render.

• To render an RMarkdown report and emulate RStudio’s “Knit HTML” button, use rmarkdown::render('myAwesomeReport.rmd')
• To render an R script and emulate RStudio’s “Compile Notebook” button, use rmarkdown::render('myAwesomeScript.R')

Write a Makefile to automate your pipeline using make. See the Links section below for help. Also demonstrated in the example 02_rAndMake and in the example 03_knitWithoutRStudio

• To run an R script use Rscript myAwesomeScript.R
• To render an RMarkdown report, use Rscript -e "rmarkdown::render('myAwesomeReport.rmd')"
• To render an R script, use Rscript -e "rmarkdown::render('myAwesomeScript.R')"
• See the Makefile in 03_knitWithoutRStudio to see these commands in action

Provide a link to a README.md page that explains how your pipeline works and links to the remaining files. Your peers and the TAs should be able to go to this landing page and re-run your analysis quickly and easily.

Consider including an image showing a graphical view (the dependency diagram) of your pipeline using makefile2graph. On Mac or Linux you can install makefile2graph using Homebrew or Linuxbrew with the command brew install makefile2graph.

## I want to aim higher!

Follow the basic Gapminder blueprint above, but find a different data aggregation task, different panelling/faceting emphasis, focus on different variables, etc.

Use non-Gapminder data – like the singer data or your own?

This means you’ll need to spend more time on data cleaning and sanity checking. You will probably have an entire script (or more!) devoted to data prep. Examples:

• Are there wonky factors? Consider bringing in as character, addressing their deficiencies, then converting to factor.
• Are there variables you’re just willing to drop at this point? Do it!
• Are there dates and times that need special handling? Do it!
• Are there annoying observations that require very special handling or crap up your figures (e.g. Oceania)? Drop them!

Experiment with running R code saved in a script from within R Markdown. Here’s some official documentation on code externalization.

Embed pre-existing figures in an R Markdown document, i.e. an R script creates the figures, then the report incorporates them. General advice on writing figures to file is here. See an example of this in an R Markdown file in one of the examples.

Import pre-existing data in an R Markdown document, then format nicely as a table.

Use Pandoc and/or LaTeX to explore new territory in document compilation. You could use Pandoc as an alternative to rmarkdown (or knitr) for Markdown to HTML conversion; you’d still use rmarkdown for conversion of R Markdown to Markdown. You would use LaTeX to get PDF output from Markdown.

## Authors

Written mostly by Shaun Jackman and Jenny Bryan with a little edit from Giulio Dalla Riva.

## Peer Review

The peer review is ready and is due November 17, 2017 (before midnight)! Here’s what you’ll need to do:

1. Find your github username in the table below. If it’s not there, let Giulio know! Slack me @giulio.
2. Add the people who will be giving you a review as collaborators to the repo containing your homework submission.
3. Give a review of this homework for the two people you’ve been assigned to. There should be an issue in their repo titled something like hw0x ready for grading – put your review in there as a comment.
• If there is no such issue, make one! (in their repo)

** If you did not yet communicate me your github handle, please do **

Check out the guidelines for giving a peer review.

Your_github Instructions