Chapter 20. Many Models with purrr and broom

Introduction

In this chapter you’re going to learn three powerful ideas that help you to work with large numbers of models with ease:

  • Using many simple models to better understand complex datasets.

  • Using list-columns to store arbitrary data structures in a data frame. For example, this will allow you to have a column that contains linear models.

  • Using the broom package, by David Robinson, to turn models into tidy data. This is a powerful technique for working with large numbers of models because once you have tidy data, you can apply all of the techniques that you’ve learned about earlier in the book.

We’ll start by diving into a motivating example using data about life expectancy around the world. It’s a small dataset but it illustrates how important modeling can be for improving your visualizations. We’ll use a large number of simple models to partition out some of the strongest signals so we can see the subtler signals that remain. We’ll also see how model summaries can help us pick out outliers and unusual trends.

The following sections will dive into more detail about the individual techniques:

Get R for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.