Chapter 7. A Model Workflow

In Chapter 6, we discussed the parsnip package, which can be used to define and fit the model. This chapter introduces a new concept called a model workflow. The purpose of this concept (and the corresponding tidymodels workflow() object) is to encapsulate the major pieces of the modeling process (discussed in Chapter 1). The workflow is important in two ways. First, using a workflow concept encourages good methodology since it is a single point of entry to the estimation components of a data analysis. Second, it enables the user to better organize projects. These two points are discussed in the following sections.

Where Does the Model Begin and End?

So far, when we have used the term “the model,” we have meant a structural equation that relates some predictors to one or more outcomes. Let’s consider again linear regression as an example. The outcome data are denoted as $y_i$, where there are i = 1 ... n samples in the training set. Suppose that there are p predictors x i1 , ... , x ip that are used in the model. Linear regression produces the following model equation:

y ^ i = β ^ 0 + β ^ 1 x i1 + ... + β ^ p x ip

While this is a linear model, it is linear only in the parameters. The predictors could be nonlinear terms (such as the log ( x i ) ).

Warning

The conventional way of thinking about the modeling process is that it only includes the model fit.

For some straightforward data sets, fitting the model itself ...

Get Tidy Modeling with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.