Overview

Consult the general homework guidelines.

Due anytime Friday 2017-10-20.

Goals:

• Reorder a factor in a principled way based on the data and demonstrate the effect in arranged data and in figures.
• Improve a figure (or make one from scratch), using new knowledge, e.g., control the color scheme, use factor levels, smoother mechanics.
• Implement visualization design principles.
• Write some data to file and load it back into R. E.g., save a plot to file and include it in a R Markdown report via ![Alt text](/path/to/img.png).
• Organise your github, to celebrate the completion of STAT 545 and/or to prepare for the glorious future of STAT 547.

Remember the sampler concept. Your homework should serve as your own personal cheatsheet in the future for canonical tasks. Make things nice – your future self will thank you!

Gapminder, Singer, … or something else

You can work with the gapminder data or take this chance to play with something else, such as Joey’s Singer. In which case, you’ll have to create comparable tasks for yourself.

Factor management

Step goals:

• Define factor variables;
• Drop factor / levels;
• Reorder levels based on knowledge from data.

I give here the trace for both the Singer and the Gapminder dataframes.

Singer version:

Factorise. Transform some of the variable in the singer_locations dataframe into factors: pay attention at what levels you introduce and their order. Try and consider the difference between the base R as.factor and the forcats-provided functions.

Drop 0. Filter the singer_locations data to remove observations associated with the uncorrectly inputed year 0. Additionally, remove unused factor levels. Provide concrete information on the data before and after removing these rows and levels; address the number of rows and the levels of the affected factors.

Reorder the levels of year, artist_name or title. Use the forcats package to change the order of the factor levels, based on a principled summary of one of the quantitative variables. Consider experimenting with a summary statistic beyond the most basic choice of the median.

Gapminder version:

Drop Oceania. Filter the Gapminder data to remove observations associated with the continent of Oceania. Additionally, remove unused factor levels. Provide concrete information on the data before and after removing these rows and Oceania; address the number of rows and the levels of the affected factors.

Reorder the levels of country or continent. Use the forcats package to change the order of the factor levels, based on a principled summary of one of the quantitative variables. Consider experimenting with a summary statistic beyond the most basic choice of the median.

Common part:

While you’re here, practice writing to file and reading back in (see next section).

Characterize the (derived) data before and after your factor re-leveling.

• Explore the effects of arrange(). Does merely arranging the data have any effect on, say, a figure?
• Explore the effects of reordering a factor and factor reordering coupled with arrange(). Especially, what effect does this have on a figure?

These explorations should involve the data, the factor levels, and some figures.

File I/O

Experiment with one or more of write_csv()/read_csv() (and/or TSV friends), saveRDS()/readRDS(), dput()/dget(). Create something new, probably by filtering or grouped-summarization of Singer or Gapminder. I highly recommend you fiddle with the factor levels, i.e. make them non-alphabetical (see previous section). Explore whether this survives the round trip of writing to file then reading back in.

Visualization design

Remake at least one figure or create a new one, in light of something you learned in the recent class meetings about visualization design and color. Maybe juxtapose your first attempt and what you obtained after some time spent working on it. Reflect on the differences. If using Gapminder, you can use the country or continent color scheme that ships with Gapminder. Consult the guest lecture from Tamara Munzner and everything here.

Writing figures to file

Use ggsave() to explicitly save a plot to file. Then use ![Alt text](/path/to/img.png) to load and embed it in your report. You can play around with various options, such as:

• Arguments of ggsave(), such as width, height, resolution or text scaling.
• Various graphics devices, e.g. a vector vs. raster format.
• Explicit provision of the plot object p via ggsave(..., plot = p). Show a situation in which this actually matters.

You have 6 weeks of R Markdown and GitHub experience now. You’ve reviewed 4 peer assignments. Surely there are aspects of your current repo organization that could be better. Deal with that. Ideas:

• Good for future: hw03 dplyr verbs
• Bummer in the future: hw03
• Include a slug with content info!
• Remove all downstream stuff, e.g. figures, html, etc. and re-render everything. It will be nice to not have weird, vestigial files lying around to puzzle you in future.
• Anything that’s Rmd but that could be md? Convert it.

But I want to do more!

Make a deeper exploration of the forcats packages, i.e. try more of the factor level reordering functions.

Revalue a factor, e.g.:

• Singer version: Pick a handful of locations (they are named city, try using distinct()) that you can pinpoint to a geographical place (city, region, country, continent,…). Create an excerpt of the Singer data, filtered to just those rows. Create a (couple of) new factor(s) – you pick the name(s)! – by mapping the existing city factor levels to the new (city, region, country…) levels.
• Examples: “London, England” –> “London”, “England”, “UK”, “Europe”;
• “Los Angeles, CA” –> “Los Angeles”, “California”, “USA”, “Americas”;
• “310, Louisiana” –> “New Orleans”, “Louisiana”, “USA”, “Americas”.

When you get tired, if you still have time on your hand, try to make this process of geolocalization more streamlined: you may want to try and use the separate function from tidyr.

• Gapminder version: Pick a handful of countries, each of which you can associate with a stereotypical food (or any other non-controversial thing … sport? hobby? type of music, art or dance? animal? landscape feature?). Create an excerpt of the Gapminder data, filtered to just these countries. Create a new factor – you pick the name! – by mapping the existing country factor levels to the new levels.
• Examples: Italy –> wine, Germany –> beer, Japan –> sake. (Austria, Germany) –> German, (Mexico, Spain) –> Spanish, (Portugal, Brazil) –> Portuguese. Let your creativity flourish.

You’re encouraged to reflect on what was hard/easy, problems you solved, helpful tutorials you read, etc. Give credit to your sources, whether it’s a blog post, a fellow student, an online tutorial, etc.

Submit the assignment

It’s always the same: add the teaching stuff as collabs, open the issue, you are done.

Rubric

Check minus: One or more elements are missing or sketchy. Missed opportunities to complement code and numbers with a figure and interpretation. Technical problem that is relatively easy to fix. It’s hard to find the report in this crazy repo.

Check: Hits all the elements. No obvious mistakes. Pleasant to read. No heroic detective work required. Well done! This should be the most typical mark!

Check plus: Exceeded the requirements in number of dimensions. Developed novel tasks that were indeed interesting and “worked”. Impressive use of R – maybe involving functions, packages or workflows that weren’t given in class materials. Impeccable organization of repo and report. You learned something new from reviewing their work and you’re eager to incorporate it into your work.

Peer Review

The peer review is ready and is due October 25, 2017 (before midnight)! Here’s what you’ll need to do:

1. Find your github username in the table below. If it’s not there, let Giulio know! Slack me @giulio.
2. Add the people who will be giving you a review as collaborators to the repo containing your homework submission.
3. Give a review of this homework for the two people you’ve been assigned to. There should be an issue in their repo titled something like hw0x ready for grading – put your review in there as a comment.
• If there is no such issue, make one! (in their repo)

Check out the guidelines for giving a peer review.

Your_github Instructions