### Overview

Consult the general homework guidelines.

Due anytime Wednesday 2017-11-08.

Pick (at least) two of the six (numbered) topics below and do one of the exercise prompts listed, or something comparable using your dataset of choice.

### 1. Character data

Read and work the exercises in the Strings chapter or R for Data Science.

### 2. Writing functions

Pick one:

• Write one (or more) functions that do something useful to pieces of the Gapminder or Singer data. It is logical to think about computing on the mini-data frames corresponding to the data for each specific country, location, year, band, album, … This would pair well with the prompt below about working with a nested data frame, as you could apply your function there.
• Make it something you can’t easily do with built-in functions. Make it something that’s not trivial to do with the simple dplyr verbs. The linear regression function presented here is a good starting point. You could generalize that to do quadratic regression (include a squared term) or use robust regression, using MASS::rlm() or robustbase::lmrob().
• If you plan to complete the homework where we build an R package, write a couple of experimental functions exploring some functionality that is useful to you in real life and that might form the basis of your personal package.

### 3. Work with the candy data

In 2015, we explored a dataset based on a Halloween candy survey (but it included many other odd and interesting questions). Work on something from this homework from 2015. It is good practice on basic data ingest, exploration, character data cleanup, and wrangling.

### 4. Work with the singer data

The singer_location dataframe in the singer package contains geographical information stored in two different formats: 1. as a (dirty!) variable named city; 2. as a latitude / longitude pair (stored in latitude, longitude respectively). The function revgeocode from the ggmap library allows you to retrieve some information for a pair (vector) of longitude, latitude (warning: notice the order in which you need to pass lat and long). Read its manual page.

1. Use purrr to map latitude and longitude into human readable information on the band’s origin places. Notice that revgeocode(... , output = "more") outputs a dataframe, while revgeocode(... , output = "address") returns a string: you have the option of dealing with nested dataframes.
You will need to pay attention to two things:
• Not all of the track have a latitude and longitude: what can we do with the missing information? (filtering, …)
• Not all of the time we make a research through revgeocode() we get a result. What can we do to avoid those errors to bite us? (look at possibly() in purrr…)
2. Try to check wether the place in city corresponds to the information you retrieved.

3. If you still have time, you can go visual: give a look to the library leaflet and plot some information about the bands. A snippet of code is provided below.

singer_locations %>%
leaflet()  %>%
addCircles(popup = ~artist_name)

### 5. Work with a list

Work through and write up a lesson from the purrr tutorial:

### 6. Work with a nested data frame

Create a nested data frame and map a function over the list column holding the nested data. Use list extraction or other functions to pull interesting information out of these results and work your way back to a simple data frame you can visualize and explore.

Here’s a fully developed prompt for Gapminder:

• See the split-apply-combine lesson from class
• Nest the data by country (and continent).
• Fit a model of life expectancy against year. Possibly quadratic, possibly robust (see above prompt re: function writing).
• Use functions for working with fitted models or the broom package to get information out of your linear models.
• Use the usual dplyr, tidyr, and ggplot2 workflows to explore, e.g., the estimated cofficients.

Inspiration for the modelling and downstream inspiration

• Find countries with interesting stories. - Sudden, substantial departures from the temporal trend is interesting. How could you operationalize this notion of “interesting”?
• Use the residuals to detect countries where your model is a terrible fit. Examples: Are there are 1 or more freakishly large residuals, in an absolute sense or relative to some estimate of background variability? Are there strong patterns in the sign of the residuals? E.g., all pos, then all neg, then pos again.
• Fit a regression using ordinary least squares and a robust technique. Determine the difference in estimated parameters under the two approaches. If it is large, consider that country “interesting”.
• Compare a linear and quadratic fit

You’re encouraged to reflect on what was hard/easy, problems you solved, helpful tutorials you read, etc. Give credit to your sources, whether it’s a blog post, a fellow student, an online tutorial, etc.

### Submit the assignment

Follow instructions on How to submit homework

### Rubric

Check minus: One or more elements are missing or sketchy. Missed opportunities to complement code and numbers with a figure and interpretation. Technical problem that is relatively easy to fix. It’s hard to find the report in this crazy repo.

Check: Hits all the elements. No obvious mistakes. Pleasant to read. No heroic detective work required. Well done! This should be the most typical mark!

Check plus: Exceeded the requirements in number of dimensions. Developed novel tasks that were indeed interesting and “worked”. Impressive use of R – maybe involving functions, packages or workflows that weren’t given in class materials. Impeccable organization of repo and report. You learned something new from reviewing their work and you’re eager to incorporate it into your work.

## Peer Review

The peer review is ready and is due November 13, 2017 (before midnight)! Here’s what you’ll need to do:

1. Find your github username in the table below. If it’s not there, let Giulio know! Slack me @giulio.
2. Add the people who will be giving you a review as collaborators to the repo containing your homework submission.
3. Give a review of this homework for the two people you’ve been assigned to. There should be an issue in their repo titled something like hw0x ready for grading – put your review in there as a comment.
• If there is no such issue, make one! (in their repo)

** If you did not yet communicate me your github handle, please do **

Check out the guidelines for giving a peer review.

Your_github Instructions
Benjamin Hives Please add abishekarun and xinmiaow as collaborators to your repo containing hw07. Please review the hw07 submission of swynes and CassKon.
Cathy Tang Please add farihakhan and ilgan as collaborators to your repo containing hw07. Please review the hw07 submission of rainerlempert and mlawre01.