Chapter 16 Table lookup

I try to use dplyr joins for most tasks that combine data from two tibbles. But sometimes you just need good old “table lookup”. Party like it’s Microsoft Excel LOOKUP() time!

16.1 Load gapminder and the tidyverse

16.3 Dorky national food example.

Make a lookup table of national foods. Or at least the stereotype. Yes, I have intentionally kept Mexico in mini-Gapminder and neglected to put Mexico here.

16.4 Lookup national food

match(x, table) reports where the values in the key x appear in the lookup variable table. It returns positive integers for use as indices. It assumes x and table are free-range vectors, i.e. there’s no implicit data frame on the radar here.

Gapminder’s country plays the role of the key x. It is replicated, i.e. non-unique, in mini_gap, but not in food, i.e. no country appears more than once food$country. FYI match() actually allows for multiple matches by only consulting the first.

In table lookup, there is always a value variable y that you plan to index with the match(x, table) result. It often lives together with table in a data frame; they should certainly be the same length and synced up with respect to row order.

But first…

I get x and table backwards some non-negligible percentage of the time. So I store the match indices and index the data frame where table lives with it. Add x as a column and eyeball-o-metrically assess that all is well.

Once all looks good, do the actual table lookup and, possibly, add the new info to your main table.

Of course, if this was really our exact task, we could have used a join!

But sometimes you have a substantive reason (or psychological hangup) that makes you prefer the table look up interface.

16.5 World’s laziest table lookup

While I’m here, let’s demo another standard R trick that’s based on indexing by name.

Imagine the table you want to consult isn’t even a tibble but is, instead, a named character vector.

Another way to get the national foods for mini-Gapminder is to simply index food_vec with mini_gap$country.

HOLD ON. STOP. Twinkies aren’t the national food of Mexico!?! What went wrong?

Remember mini_gap$country is a factor. So when we use it in an indexing context, it’s integer nature is expressed. It is pure luck that we get the right foods for Belgium and Canada. Luckily the Mexico - United States situation tipped us off. Here’s what we are really indexing food_vec by above:

To get our desired result, we need to explicitly coerce mini_gap$country to character.

When your key variable is character (and not a factor), you can skip this step.