Chapter 20. Many Models with purrr and broom
Introduction
In this chapter youâre going to learn three powerful ideas that help you to work with large numbers of models with ease:
-
Using many simple models to better understand complex datasets.
-
Using list-columns to store arbitrary data structures in a data frame. For example, this will allow you to have a column that contains linear models.
-
Using the broom package, by David Robinson, to turn models into tidy data. This is a powerful technique for working with large numbers of models because once you have tidy data, you can apply all of the techniques that youâve learned about earlier in the book.
Weâll start by diving into a motivating example using data about life expectancy around the world. Itâs a small dataset but it illustrates how important modeling can be for improving your visualizations. Weâll use a large number of simple models to partition out some of the strongest signals so we can see the subtler signals that remain. Weâll also see how model summaries can help us pick out outliers and unusual trends.
The following sections will dive into more detail about the individual techniques:
-
In âgapminderâ, youâll see a motivating example that puts list-columns to use to fit per-county models to world economic data.
-
In âList-Columnsâ, youâll learn more about the list-column data structure, and why itâs valid to put lists in data frames.
-
In âCreating List-Columnsâ, youâll learn the ...
Get R for Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.