Part IV. Model

Now that you are equipped with powerful programming tools we can finally return to modeling. Youâll use your new tools of data wrangling and programming to fit many models and understand how they work. The focus of this book is on exploration, not confirmation or formal inference. But youâll learn a few basic tools that help you understand the variation within your models.

The goal of a model is to provide a simple low-dimensional summary of a dataset. Ideally, the model will capture true âsignalsâ (i.e., patterns generated by the phenomenon of interest), and ignore ânoiseâ (i.e., random variation that youâre not interested in). Here we only cover âpredictiveâ models, which, as the name suggests, generate predictions. There is another type of model that weâre not going to discuss: âdata discoveryâ models. These models donât make predictions, but instead help you discover interesting relationships within your data. (These two categories of models are sometimes called supervised and unsupervised, but I donât think that terminology is particularly illuminating.)

This book is not going to give you a deep understanding of the mathematical theory that underlies models. It will, however, build your intuition about how statistical models work, and give you a family of useful tools that allow you to use models to better understand your data: ...

Get R for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

R for Data Science by Hadley Wickham, Garrett Grolemund

Part IV. Model

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly