## 0.1 Announcements

• Assignment 01 is due tonight!
• Assignment 02 released today.
• GitHub Issues:
• All are excellent discussions!
• You can “Close” an Issue if you consider it “resolved”.
• Update on getting help in class: let’s take this approach:
• TA’s more mobile during lecture
• Students:
• Question for the class: hands up fully.
• Question for TA’s: something different of your choice. Maybe a wave, holding up a pen/object, hand not fully extended, …
• I’ll aim to take a brief break in class, but with a challenge exercise for those not needing the break.
• We’ll be more strict on your participation as of this week. This means your commit timestamp matters! Your commit(s) shouldn’t be made too long after lecture. I’ll aim to include this in our demonstrations.

## 0.2 Review, and Looking Ahead

We’ve done a lot so far! Let’s recap/review. Here are the main concepts we’ve covered:

• git and GitHub concepts and how-to
• Basics of R
• Basics of authoring in markdown (with various flavours)
• RStudio features:
• as a text editor (.md, .R, .Rmd, etc…)
• as containing the R machinery (running code in the console)
• as a command line tool
• as a git client

In short, we’ve covered the foundation of key tools for data analysis. For the next few weeks, we’re going to focus more on the data itself, for the purpose of exploratory data analysis. Specifically, and not in this particular order,

• Data wrangling
• Data computations
• Plotting

## 0.3 Today’s Lessons

Today we’ll introduce the dplyr package. Specifically, we’ll look at these three lessons:

• Intro to dplyr syntax
• The dplyr advantage
• Relational/comparison and logical operators in R

## 0.4 Resources

All three of today’s lessons are closely aligned to the stat545: dplyr-intro.

More detail can be found in the r4ds: transform chapter, up until and including the select() section. Section 5.2 also elaborates on relational/comparison and logical operators in R

Here are some supplementary resources:

• A similar resource to the r4ds one above is the intro to dplyr vignette, up until and including the select() section.

## 0.5 Participation

To get participation points for today, we’ll be filling out the cm005-exercise.Rmd file, and adding it to your participation repo.

Let’s get set up:

2. Download the .Rmd version to your local participation repo.
• You should have it cloned to your local machine (aka your computer) from last time.
3. Optional, but recommended: Stage and commit the Rmd file (you can do this through RStudio).
• You can still get participation marks by uploading the final files to GitHub.

# 1 Intro to dplyr syntax

## 1.1 Learning Objectives

Here are the concepts we’ll be exploring in this lesson:

• tidyverse
• dplyr functions:
• select
• arrange
• filter
• piping

By the end of this lesson, students are expected to be able to:

• subset and rearrange data with dplyr
• use piping (%>%) when implementing function chains

## 1.2 Preamble

• The history of dplyr: plyr
• tibbles are a special type of data frame
• the tidyverse

## 1.3 Demonstration

Let’s get started with the exercise:

1. Open RStudio, and download the tidyverse meta-package by executing install.packages("tidyverse") into the R console.
2. Optional: open the STAT545_participation RStudio project in RStudio.
3. With RStudio, open the cm005-exercise.Rmd file you downloaded and committed earlier.
4. Follow the instructions in the .Rmd file.
5. Knit, commit, push.

# 2 The dplyr advantage

## 2.1 Learning Objectives

By the end of this lesson, students are expected to be able to:

• Have a sense of why dplyr is advantageous compared to the “base R” way with respect to good coding practice.

Why?

## 2.2 Compare base R to dplyr

Metaprogramming.

Hadley Wickham says it best in adv-r: meta: you trade precision for concision. Here’s the example he gives:

• subset(diamonds, x == 0 & y == 0 & z == 0), vs.
• diamonds[diamonds$x == 0 & diamonds$y == 0 & diamonds$z == 0, ] Pure functions. I=O, and does not impact workspace. Self-documenting code. This is where the tidyverse shines. Example of dplyr vs base R: gapminder %>% filter(country == "Cambodia") %>% select(year, lifeExp) vs. gapminder[gapminder$country == "Cambodia", c("year", "lifeExp")]

No need to take excerpts.

Wrangle with dplyr first, then pipe into a plot/analysis.

OR, use the subset argument that’s often offered by R functions like lm().

Especially don’t use magic numbers to subset!

Note that you need to use the assignment operator to store changes!

# 3 Small break

Here are some things you might choose to do on this break:

• Talk with a TA, Vincenzo, or your neighbour(s) about the content so far.
• Fill out the challenge on the cm005-exercise.Rmd file.
• Work on an assignment.

# 4 Relational/Comparison and Logical Operators in R

## 4.1 Learning Objectives

Here are the concepts we’ll be exploring in this lesson:

• Relational/Comparison operators
• Logical operators

By the end of this lesson, students are expected to be able to:

• Predict the output of R code containing the above operators.
• Explain the difference between &/&& and |/||, and name a situation where one should be used over the other.

## 4.2 Demonstration

Continue along with the cm005-exercise.Rmd file.

# 5 If there’s time remaining

1. Let’s do the bonus exercises together, in the cm05-exercise.Rmd file.
2. Another “break”