Chapter 11. Model Selection
This chapter will discuss optimizing hyperparameters. It will also explore the issue of whether the model requires more data to perform better.
Validation Curve
Creating a validation curve is one way to determine an appropriate value for a hyperparameter. A validation curve is a plot that shows how the model performance responds to changes in the hyperparameter’s value (see Figure 11-1). The chart shows both the training data and the validation data. The validation scores allow us to infer how the model would respond to unseen data. Typically, we would choose a hyperparameter that maximizes the validation score.
In the following example, we will use Yellowbrick to see if changing the value of the max_depth
hyperparameter changes the model performance of a random forest. You can provide a scoring parameter
set to a scikit-learn model metric (the default for classification is 'accuracy'):
Tip
Use the n_jobs parameter to take advantage of the CPUs and run this faster. If you set it to -1, it will use all of the CPUs.
>>>fromyellowbrick.model_selectionimport(...ValidationCurve,...)>>>fig,ax=plt.subplots(figsize=(6,4))>>>vc_viz=ValidationCurve(...RandomForestClassifier(n_estimators=100),...param_name="max_depth",...param_range=np.arange(1,11),...cv=10,...n_jobs=-1,...)>>>vc_viz.fit(X,y)>>>vc_viz.poof()>>>fig.savefig("images/mlpr_1101.png",dpi=300)
Figure 11-1. Validation curve report.
The ValidationCurve ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access