0.1 Announcements

0.2 Review, and Looking Ahead

We’ve done a lot so far! Let’s recap/review. Here are the main concepts we’ve covered:

In short, we’ve covered the foundation of key tools for data analysis. For the next few weeks, we’re going to focus more on the data itself, for the purpose of exploratory data analysis. Specifically, and not in this particular order,

0.3 Today’s Lessons

Today we’ll introduce the dplyr package. Specifically, we’ll look at these three lessons:

0.4 Resources

All three of today’s lessons are closely aligned to the stat545: dplyr-intro.

More detail can be found in the r4ds: transform chapter, up until and including the select() section. Section 5.2 also elaborates on relational/comparison and logical operators in R

Here are some supplementary resources:

0.5 Participation

To get participation points for today, we’ll be filling out the cm005-exercise.Rmd file, and adding it to your participation repo.

Let’s get set up:

  1. I made it easier to download this time: just click the upper-right drop-down menu of the html version.
  2. Download the .Rmd version to your local participation repo.
    • You should have it cloned to your local machine (aka your computer) from last time.
  3. Optional, but recommended: Stage and commit the Rmd file (you can do this through RStudio).
    • You can still get participation marks by uploading the final files to GitHub.

1 Intro to dplyr syntax

1.1 Learning Objectives

Here are the concepts we’ll be exploring in this lesson:

  • tidyverse
  • dplyr functions:
    • select
    • arrange
    • filter
  • piping

By the end of this lesson, students are expected to be able to:

  • subset and rearrange data with dplyr
  • use piping (%>%) when implementing function chains

1.2 Preamble

Let’s talk about:

  • The history of dplyr: plyr
  • tibbles are a special type of data frame
  • the tidyverse

1.3 Demonstration

Let’s get started with the exercise:

  1. Open RStudio, and download the tidyverse meta-package by executing install.packages("tidyverse") into the R console.
  2. Optional: open the STAT545_participation RStudio project in RStudio.
  3. With RStudio, open the cm005-exercise.Rmd file you downloaded and committed earlier.
  4. Follow the instructions in the .Rmd file.
  5. Knit, commit, push.

2 The dplyr advantage

2.1 Learning Objectives

By the end of this lesson, students are expected to be able to:

  • Have a sense of why dplyr is advantageous compared to the “base R” way with respect to good coding practice.

Why?

  • Having this in the back of your mind will help you identify qualities of and produce a readable analysis.

2.2 Compare base R to dplyr

Let’s talk about these concepts:

Metaprogramming.

Hadley Wickham says it best in adv-r: meta: you trade precision for concision. Here’s the example he gives:

  • subset(diamonds, x == 0 & y == 0 & z == 0), vs.
  • diamonds[diamonds$x == 0 & diamonds$y == 0 & diamonds$z == 0, ]

Pure functions.

I=O, and does not impact workspace.

Self-documenting code.

This is where the tidyverse shines.

Example of dplyr vs base R:

gapminder %>%
  filter(country == "Cambodia") %>%
  select(year, lifeExp)

vs.

gapminder[gapminder$country == "Cambodia", c("year", "lifeExp")]

No need to take excerpts.

Wrangle with dplyr first, then pipe into a plot/analysis.

OR, use the subset argument that’s often offered by R functions like lm().

Especially don’t use magic numbers to subset!

Note that you need to use the assignment operator to store changes!

3 Small break

Here are some things you might choose to do on this break:

4 Relational/Comparison and Logical Operators in R

4.1 Learning Objectives

Here are the concepts we’ll be exploring in this lesson:

  • Relational/Comparison operators
  • Logical operators

By the end of this lesson, students are expected to be able to:

  • Predict the output of R code containing the above operators.
  • Explain the difference between &/&& and |/||, and name a situation where one should be used over the other.

4.2 Demonstration

Continue along with the cm005-exercise.Rmd file.

5 If there’s time remaining

  1. Let’s do the bonus exercises together, in the cm05-exercise.Rmd file.
  2. Another “break”
