STAT 545
Data wrangling, exploration, and analysis with R
Welcome to STAT 545
Learn how to:
- explore, groom, visualize, and analyze data,
- make all of that reproducible, reusable, and shareable,
- using R.
This site is about everything that comes up during data analysis except for statistical modelling and inference. This might strike you as strange, given R’s statistical roots. First, let me assure you we believe that modelling and inference are important. But the world already offers a lot of great resources for doing statistics with R.
The design of STAT 545 was motivated by the need to provide more balance in applied statistical training. Data analysts spend a considerable amount of time on project organization, data cleaning and preparation, and communication. These activities can have a profound effect on the quality and credibility of an analysis. Yet these skills are rarely taught, despite how important and necessary they are. STAT 545 aims to address this gap.
History and future
These materials originated in the STAT 545 course at the University of British Columbia:
“The STAT 545 course became notable as an early example of a data science course taught in a statistics program. It is also notable for its focus on teaching using modern R packages, Git and GitHub, its extensive sharing of teaching materials openly online, and its strong emphasis on practical data cleaning, exploration, and visualization skills, rather than algorithms and theory.”
The main author, Jenny Bryan (jennybryan.org), developed this version of STAT 545 as a professor at UBC. She has since joined RStudio as a Software Engineer, on the tidyverse and r-lib teams and is an adjunct professor at UBC. In September 2019, we (amicably) created separate spaces for the ongoing maintenance of this content and the continued offerings of STAT 545 at UBC (https://stat545.stat.ubc.ca), which is alive and well.
We plan to continue maintaining these resources, as they are still used in STAT 545 at UBC and by people teaching themselves R. Some topics have since been developed more fully elsewhere and we may link out to those resources. For example, the Git and GitHub content of STAT 545 eventually grew into its own website: happygitwithr.com. Some material has been retired, but is archived in the repository of the old website. Finally, the new website has URLs that are more human-friendly; we believe we created the necessary redirects, so we don’t break other people’s links. If you think we’ve missed one, please let us know in an issue.
Other contributors
Several STAT 545 TAs were instrumental in the development of these materials and members of the RStudio Education Team ported the original website into the modern and more maintainable framework we enjoy today:
- TAs who contributed content: Dean Attali (web applications with Shiny), Julia Gustavsen (Shiny), Shaun Jackman (automating workflows), Bernhard Konrad (system setup, package development, the shell), Gloria Li (regular expressions), Andrew MacDonald (getting data from the web), Kieran Samuk (regular expressions)
- RStudio: Alison Hill (https://alison.rbind.io) and intern Grace Lawley (https://grace.rbind.io) lead the heroic effort to port a vintage R Markdown website into bookdown. Intern Desirée De Leon (https://desiree.rbind.io) contributed design expertise.
Colophon
This book was written in bookdown inside RStudio. The website stat545.com is hosted with Netlify, and automatically updated after every commit by Travis-CI. The complete source is available from GitHub.
The STAT 545 logo and the book style was designed by Desirée De Leon.
This version of the book was built with:
#> Finding R package dependencies ... Done!
#> setting value
#> version R version 3.6.1 (2017-01-27)
#> os Ubuntu 16.04.6 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en_US.UTF-8
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz UTC
#> date 2019-10-14
Along with these packages:
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
This is a human-readable summary of (and not a substitute for) the license. Please see https://creativecommons.org/licenses/by-sa/4.0/legalcode for the full legal text.
You are free to:
Share—copy and redistribute the material in any medium or format
Remix—remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
Attribution—You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
ShareAlike—If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
No additional restrictions—You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.