Validation curves refer to an algorithm's achieved performance, given different hyperparameters. For each hyperparameter value, we perform k-fold cross validations and store the in-sample performance and out-of-sample performance. We then calculate and plot the mean and standard deviation of in-sample and out-of-sample performance for each hyperparameter value. By examining the relative and absolute performance, we can gauge the level of bias and variance in our model.
Borrowing the KNeighborsClassifier example from Chapter 1, A Machine Learning Refresher, we modify it in order to experiment with different neighbor numbers. We start by loading the required libraries and data. Notice that we import validation_curve from ...