Chapter 9. Semisupervised Learning
Until now, we have viewed supervised learning and unsupervised learning as two separate and distinct branches of machine learning. Supervised learning is appropriate when our dataset is labeled, and unsupervised learning is necessary when our dataset is unlabeled.
In the real world, the distinction is not quite so clear. Datasets are usually partially labeled, and we want to efficiently label the unlabeled observations while leveraging the information in the labeled set. With supervised learning, we would have to toss away the majority of the dataset because it is unlabeled. With unsupervised learning, we would have the majority of the data to work with but would not know how to take advantage of the few labels we have.
The field of semisupervised learning blends the benefits of both supervised and unsupervised learning, taking advantage of the few labels that are available to uncover structure in a dataset and help label the rest.
We will continue to use the credit card transactions dataset in this chapter to showcase semisupervised learning.
Data Preparation
As before, let’s load in the necessary libraries and prepare the data. This should be pretty familiar by now:
'''Main'''
import
numpy
as
np
import
pandas
as
pd
import
os
,
time
,
re
import
pickle
,
gzip
'''Data Viz'''
import
matplotlib.pyplot
as
plt
import
seaborn
as
sns
color
=
sns
.
color_palette
()
import
matplotlib
as
mpl
%
matplotlib
inline
'''Data Prep and Model Evaluation'''
from
sklearn ...
Get Hands-On Unsupervised Learning Using Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.