O'Reilly logo

Effective Amazon Machine Learning by Alexis Perrier

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Splitting the data

As we saw in Chapter 2Machine Learning Definitions and Concepts, in order to build and select the best model, we need to split the dataset into three parts: training, validation, and test, with the usual ratios being 60%, 20%, and 20%. The training and validation sets are used to build several models and select the best one while the held-out set is used for the final performance evaluation on previously unseen data. We will use the held-out subset in Chapter 6, Predictions and Performances to simulate batch predictions with the model we build in Chapter 5, Model Creation.

Since Amazon ML does the job of splitting the dataset used for model training and model evaluation into training and validation subsets, we only need ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required