Chapter 5. Model Evaluation and Improvement
Having discussed the fundamentals of supervised and unsupervised learning, and having explored a variety of machine learning algorithms, we will now dive more deeply into evaluating models and selecting parameters.
We will focus on the supervised methods, regression and classification, as evaluating and selecting models in unsupervised learning is often a very qualitative process (as we saw in Chapter 3).
To evaluate our supervised models, so far we have split our dataset into
a training set and a test set using the train_test_split function,
built a model on the training set by calling the fit method, and
evaluated it on the test set using the score method, which for
classification computes the fraction of correctly classified samples.
Here’s an example of that process:
In[1]:
fromsklearn.datasetsimportmake_blobsfromsklearn.linear_modelimportLogisticRegressionfromsklearn.model_selectionimporttrain_test_split# create a synthetic datasetX,y=make_blobs(random_state=0)# split data and labels into a training and a test setX_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0)# instantiate a model and fit it to the training setlogreg=LogisticRegression().fit(X_train,y_train)# evaluate the model on the test set("Test set score: {:.2f}".format(logreg.score(X_test,y_test)))
Out[1]:
Test set score: 0.88
Remember, the reason we split our data into training and test sets is that we are interested ...