The k-fold cross-validation

So far, we have been evaluating our models in the test set. By now, it is clear why we do it; however, there is one point we have not discussed yet. Let's go back to the diamond prices problem. In this chapter, we have built a simple multiple linear regression model and we have calculated some metrics on the test set. Let's say that we will use the MAE for evaluating the model. When we calculated this metric, we got 733.67. Now let's repeat the same steps for model building:

  • Train-test split
  • Standardize the numeric features
  • Model training
  • Get predictions
  • Evaluate the model using the same metric

Here we have the code again:

## Train-test splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, ...

Get Hands-On Predictive Analytics with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.