Chapter 20. Many Models with purrr and broom


In this chapter you’re going to learn three powerful ideas that help you to work with large numbers of models with ease:

  • Using many simple models to better understand complex datasets.

  • Using list-columns to store arbitrary data structures in a data frame. For example, this will allow you to have a column that contains linear models.

  • Using the broom package, by David Robinson, to turn models into tidy data. This is a powerful technique for working with large numbers of models because once you have tidy data, you can apply all of the techniques that you’ve learned about earlier in the book.

We’ll start by diving into a motivating example using data about life expectancy around the world. It’s a small dataset but it illustrates how important modeling can be for improving your visualizations. We’ll use a large number of simple models to partition out some of the strongest signals so we can see the subtler signals that remain. We’ll also see how model summaries can help us pick out outliers and unusual trends.

The following sections will dive into more detail about the individual techniques:

  • In “gapminder”, you’ll see a motivating example that puts list-columns to use to fit per-county models to world economic data.

  • In “List-Columns”, you’ll learn more about the list-column data structure, and why it’s valid to put lists in data frames.

  • In “Creating List-Columns”, you’ll learn the three main ways in which you’ll ...

Get R for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.