Chapter 18. Model Basics with modelr
Introduction
The goal of a model is to provide a simple low-dimensional summary of a dataset. In the context of this book weâre going to use models to partition data into patterns and residuals. Strong patterns will hide subtler trends, so weâll use models to help peel back layers of structure as we explore a dataset.
However, before we can start using models on interesting, real datasets, you need to understand the basics of how models work. For that reason, this chapter of the book is unique because it uses only simulated datasets. These datasets are very simple, and not at all interesting, but they will help you understand the essence of modeling before you apply the same techniques to real data in the next chapter.
There are two parts to a model:
-
First, you define a family of models that express a precise, but generic, pattern that you want to capture. For example, the pattern might be a straight line, or a quadatric curve. You will express the model family as an equation like
y = a_1 * x + a_2
ory = a_1 * x ^ a_2
. Here,x
andy
are known variables from your data, anda_1
anda_2
are parameters that can vary to capture different patterns. -
Next, you generate a fitted model by finding the model from the family that is the closest to your data. This takes the generic model family and makes it specific, like
y = 3 * x + 7
ory = 9 * x ^ 2
.
Itâs important to understand that a fitted model is just the closest model from a family ...
Get R for Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.