Chapter 6: Preparing for Model Evaluation
It is a good idea to think through how you will evaluate your model’s performance before you begin to run it. A common technique is to separate data into training and testing datasets. We do this relatively early in the process to avoid what is known as data leakage; that is, conducting analyses based on data that is intended to be set aside for model evaluation. In this chapter, we will look at approaches for creating training datasets, including how to ensure that training data is representative. We will look into cross-validation strategies such as K-fold, which address some of the limitations of using static training/testing splits. We will also begin to look more closely at assessing the performance ...
Get Data Cleaning and Exploration with Machine Learning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.