Chapter 21 Function-writing practicum

21.1 Overview

We recently learned how to write our own R functions (part 1, part 2, part 3).

Now we use that knowledge to write another useful function, within the context of the Gapminder data:

  • Input: a data.frame that contains (at least) a life expectancy variable lifeExp and a variable for year year
  • Output: a vector of estimated intercept and slope, from a linear regression of lifeExp on year

The ultimate goal is to apply this function to the Gapminder data for a specific country. We will eventually scale up to all countries using external machinery, e.g., the dplyr::group_by() + dplyr::do().

21.2 Load the Gapminder data

As usual, load gapminder. Load ggplot2 because we’ll make some plots and load dplyr too.

21.4 Get some code that works

Fit the regression:

Whoa, check out that crazy intercept! Apparently the life expectancy in France around year 0 A.D. was minus 400 years! Never forget to sanity check a model. In this case, a reparametrization is in order. I think it makes more sense for the intercept to correspond to life expectancy in 1952, the earliest date in our dataset. Estimate the intercept eye-ball-o-metrically from the plot and confirm that we’ve got something sane and interpretable now.

21.5 Turn working code into a function

Create the basic definition of a function and drop your working code inside. Add arguments and edit the inner code to match. Apply it to the practice data. Do you get the same result as before?

I had to decide how to handle the offset. Given that I will scale this up to many countries, which, in theory, might have data for different dates, I chose to set a default of 1952. Strategies that compute the offset from data, either the main Gapminder dataset or the excerpt passed to this function, are also reasonable to consider.

I loathe the names on this return value. This is not my first rodeo and I know that, downstream, these will contaminate variable names and factor levels and show up in public places like plots and tables. Fix names early!

Much better!

21.6 Test on other data and in a clean workspace

It’s always good to rotate through examples during development. The most common error this will help you catch is when you accidentally hard-wire your example into your function. If you’re paying attention to your informal tests, you will find it creepy that your function returns exactly the same results regardless which input data you give it. This actually happened to me while I was writing this document, I kid you not! I had left j_fit inside the call to coef(), instead of switching it to the_fit. How did I catch that error? I saw the fitted line below, which clearly did not have an intercept in the late 60s and a positive slope, as my first example did. Figures are a mighty weapon in the fight against nonsense. I don’t trust analyses that have few/no figures.

The linear fit is comically bad, but yes I believe the visual line and the regression results match up.

It’s also a good idea to clean out the workspace, rerun the minimum amount of code, and retest your function. This will help you catch another common mistake: accidentally relying on objects that were lying around in the workspace during development but that are not actually defined in your function nor passed as formal arguments.

21.7 Are we there yet?

Yes.

Given how I plan to use this function, I don’t feel the need to put it under formal unit tests or put in argument validity checks.