1 Learning Objectives

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(gapminder))

1.0.0.1 Resources

Some useful resources for data frame joins.

  • Jenny’s Cheatsheet for dplyr join functions.
  • The Relational Data chapter in “R for Data Science” goes through these concepts in detail.

After going through the dplyr vignette on “two-table verbs”, we’ll work on the following exercises.

Consider the following areas of countries, in hectares:

(areas <- data.frame(country=c("Canada", "United States", "India", "Vatican City"),
                     area=c(998.5*10^6, 983.4*10^6, 328.7*10^6, 44)) %>% 
     as.tbl)
  1. To the gapminder dataset, add an area variable using the areas tibble. Be sure to preserve all the rows of the original gapminder dataset.
left_join(gapminder, areas)
  1. To the gapminder dataset, add an area variable using the areas tibble, but only keeping obervations for which areas are available.
inner_join(gapminder, areas)
  1. Use a _join function to output the rows in areas corresponding to countries that are not found in the gapminder dataset.
anti_join(areas, gapminder)
  1. Use a _join function to output the rows in areas corresponding to countries that are found in the gapminder dataset.
semi_join(areas, gapminder, by="country")
  1. Construct a tibble that joins gapminder and areas, so that all rows found in each original tibble are also found in the final tibble.
full_join(areas, gapminder)
full_join(gapminder, areas)

Here are the rows containing the Vatican City:

full_join(areas, gapminder) %>% 
    filter(country=="Vatican City")
  1. Subset the gapminder dataset to have countries that are only found in the areas data frame.
semi_join(gapminder, areas)
  1. Subset the gapminder dataset to have countries that are not found in the areas data frame.
anti_join(gapminder, areas)

Let’s check…. Canada should not be in there:

anti_join(gapminder, areas) %>% 
    filter(country=="Canada")
LS0tCnRpdGxlOiAiU1RBVCA1NDUgQ2xhc3MgTWVldGluZyAwOSIKb3V0cHV0OgogICAgaHRtbF9ub3RlYm9vazoKICAgICAgICB0b2M6IHRydWUKICAgICAgICB0aGVtZTogY2VydWxlYW4KICAgICAgICBudW1iZXJfc2VjdGlvbnM6IHRydWUKZWRpdG9yX29wdGlvbnM6IAogIGNodW5rX291dHB1dF90eXBlOiBpbmxpbmUKLS0tCgojIExlYXJuaW5nIE9iamVjdGl2ZXMKCgpgYGB7cn0Kc3VwcHJlc3NQYWNrYWdlU3RhcnR1cE1lc3NhZ2VzKGxpYnJhcnkoZHBseXIpKQpzdXBwcmVzc1BhY2thZ2VTdGFydHVwTWVzc2FnZXMobGlicmFyeShnYXBtaW5kZXIpKQpgYGAKCiMjIyMgUmVzb3VyY2VzCgpTb21lIHVzZWZ1bCByZXNvdXJjZXMgZm9yIGRhdGEgZnJhbWUgam9pbnMuCgotIEplbm55J3MgW0NoZWF0c2hlZXRdKGJpdDAwMV9kcGx5ci1jaGVhdHNoZWV0Lmh0bWwpIGZvciBgZHBseXJgIGpvaW4gZnVuY3Rpb25zLgotIFRoZSBbUmVsYXRpb25hbCBEYXRhXShodHRwOi8vcjRkcy5oYWQuY28ubnovcmVsYXRpb25hbC1kYXRhLmh0bWwpIGNoYXB0ZXIgaW4gIlIgZm9yIERhdGEgU2NpZW5jZSIgZ29lcyB0aHJvdWdoIHRoZXNlIGNvbmNlcHRzIGluIGRldGFpbC4gCgoKQWZ0ZXIgZ29pbmcgdGhyb3VnaCB0aGUgYGRwbHlyYCBbdmlnbmV0dGVdKGh0dHBzOi8vY3Jhbi5yLXByb2plY3Qub3JnL3dlYi9wYWNrYWdlcy9kcGx5ci92aWduZXR0ZXMvdHdvLXRhYmxlLmh0bWwpIG9uICJ0d28tdGFibGUgdmVyYnMiLCB3ZSdsbCB3b3JrIG9uIHRoZSBmb2xsb3dpbmcgZXhlcmNpc2VzLgoKCkNvbnNpZGVyIHRoZSBmb2xsb3dpbmcgYXJlYXMgb2YgY291bnRyaWVzLCBpbiBoZWN0YXJlczoKCmBgYHtyfQooYXJlYXMgPC0gZGF0YS5mcmFtZShjb3VudHJ5PWMoIkNhbmFkYSIsICJVbml0ZWQgU3RhdGVzIiwgIkluZGlhIiwgIlZhdGljYW4gQ2l0eSIpLAogICAgICAgICAgICAgICAgICAgICBhcmVhPWMoOTk4LjUqMTBeNiwgOTgzLjQqMTBeNiwgMzI4LjcqMTBeNiwgNDQpKSAlPiUgCiAgICAgYXMudGJsKQpgYGAKCgoxLiBUbyB0aGUgYGdhcG1pbmRlcmAgZGF0YXNldCwgYWRkIGFuIGBhcmVhYCB2YXJpYWJsZSB1c2luZyB0aGUgYGFyZWFzYCB0aWJibGUuIEJlIHN1cmUgdG8gcHJlc2VydmUgYWxsIHRoZSByb3dzIG9mIHRoZSBvcmlnaW5hbCBgZ2FwbWluZGVyYCBkYXRhc2V0LgoKYGBge3J9CmxlZnRfam9pbihnYXBtaW5kZXIsIGFyZWFzKQpgYGAKCgoyLiBUbyB0aGUgYGdhcG1pbmRlcmAgZGF0YXNldCwgYWRkIGFuIGBhcmVhYCB2YXJpYWJsZSB1c2luZyB0aGUgYGFyZWFzYCB0aWJibGUsIGJ1dCBvbmx5IGtlZXBpbmcgb2JlcnZhdGlvbnMgZm9yIHdoaWNoIGFyZWFzIGFyZSBhdmFpbGFibGUuIAoKYGBge3J9CmlubmVyX2pvaW4oZ2FwbWluZGVyLCBhcmVhcykKYGBgCgozLiBVc2UgYSBgX2pvaW5gIGZ1bmN0aW9uIHRvIG91dHB1dCB0aGUgcm93cyBpbiBgYXJlYXNgIGNvcnJlc3BvbmRpbmcgdG8gY291bnRyaWVzIHRoYXQgX2FyZSBub3RfIGZvdW5kIGluIHRoZSBgZ2FwbWluZGVyYCBkYXRhc2V0LiAKCmBgYHtyfQphbnRpX2pvaW4oYXJlYXMsIGdhcG1pbmRlcikKYGBgCgoKNC4gVXNlIGEgYF9qb2luYCBmdW5jdGlvbiB0byBvdXRwdXQgdGhlIHJvd3MgaW4gYGFyZWFzYCBjb3JyZXNwb25kaW5nIHRvIGNvdW50cmllcyB0aGF0IF9hcmVfIGZvdW5kIGluIHRoZSBgZ2FwbWluZGVyYCBkYXRhc2V0LiAKCmBgYHtyfQpzZW1pX2pvaW4oYXJlYXMsIGdhcG1pbmRlciwgYnk9ImNvdW50cnkiKQpgYGAKCjUuIENvbnN0cnVjdCBhIHRpYmJsZSB0aGF0IGpvaW5zIGBnYXBtaW5kZXJgIGFuZCBgYXJlYXNgLCBzbyB0aGF0IGFsbCByb3dzIGZvdW5kIGluIGVhY2ggb3JpZ2luYWwgdGliYmxlIGFyZSBhbHNvIGZvdW5kIGluIHRoZSBmaW5hbCB0aWJibGUuIAoKYGBge3J9CmZ1bGxfam9pbihhcmVhcywgZ2FwbWluZGVyKQpmdWxsX2pvaW4oZ2FwbWluZGVyLCBhcmVhcykKYGBgCgpIZXJlIGFyZSB0aGUgcm93cyBjb250YWluaW5nIHRoZSBWYXRpY2FuIENpdHk6CgpgYGB7cn0KZnVsbF9qb2luKGFyZWFzLCBnYXBtaW5kZXIpICU+JSAKICAgIGZpbHRlcihjb3VudHJ5PT0iVmF0aWNhbiBDaXR5IikKYGBgCgoKNi4gU3Vic2V0IHRoZSBgZ2FwbWluZGVyYCBkYXRhc2V0IHRvIGhhdmUgY291bnRyaWVzIHRoYXQgYXJlIG9ubHkgZm91bmQgaW4gdGhlIGBhcmVhc2AgZGF0YSBmcmFtZS4gCgpgYGB7cn0Kc2VtaV9qb2luKGdhcG1pbmRlciwgYXJlYXMpCmBgYAoKNy4gU3Vic2V0IHRoZSBgZ2FwbWluZGVyYCBkYXRhc2V0IHRvIGhhdmUgY291bnRyaWVzIHRoYXQgYXJlIF9ub3RfIGZvdW5kIGluIHRoZSBgYXJlYXNgIGRhdGEgZnJhbWUuIAoKYGBge3J9CmFudGlfam9pbihnYXBtaW5kZXIsIGFyZWFzKQpgYGAKCkxldCdzIGNoZWNrLi4uLiBDYW5hZGEgc2hvdWxkIG5vdCBiZSBpbiB0aGVyZToKCmBgYHtyfQphbnRpX2pvaW4oZ2FwbWluZGVyLCBhcmVhcykgJT4lIAogICAgZmlsdGVyKGNvdW50cnk9PSJDYW5hZGEiKQpgYGAKCg==