Chapter 5. Model Evaluation and Improvement
Having discussed the fundamentals of supervised and unsupervised learning, and having explored a variety of machine learning algorithms, we will now dive more deeply into evaluating models and selecting parameters.
We will focus on the supervised methods, regression and classification, as evaluating and selecting models in unsupervised learning is often a very qualitative process (as we saw in Chapter 3).
To evaluate our supervised models, so far we have split our dataset into
a training set and a test set using the train_test_split
function,
built a model on the training set by calling the fit
method, and
evaluated it on the test set using the score
method, which for
classification computes the fraction of correctly classified samples.
Here’s an example of that process:
In[1]:
from
sklearn.datasets
import
make_blobs
from
sklearn.linear_model
import
LogisticRegression
from
sklearn.model_selection
import
train_test_split
# create a synthetic dataset
X
,
y
=
make_blobs
(
random_state
=
0
)
# split data and labels into a training and a test set
X_train
,
X_test
,
y_train
,
y_test
=
train_test_split
(
X
,
y
,
random_state
=
0
)
# instantiate a model and fit it to the training set
logreg
=
LogisticRegression
()
.
fit
(
X_train
,
y_train
)
# evaluate the model on the test set
(
"Test set score: {:.2f}"
.
format
(
logreg
.
score
(
X_test
,
y_test
)))
Out[1]:
Test set score: 0.88
Remember, the reason we split our data into training and test sets is that we are interested ...
Get Introduction to Machine Learning with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.