Chapter 9. Semisupervised Learning

Until now, we have viewed supervised learning and unsupervised learning as two separate and distinct branches of machine learning. Supervised learning is appropriate when our dataset is labeled, and unsupervised learning is necessary when our dataset is unlabeled.

In the real world, the distinction is not quite so clear. Datasets are usually partially labeled, and we want to efficiently label the unlabeled observations while leveraging the information in the labeled set. With supervised learning, we would have to toss away the majority of the dataset because it is unlabeled. With unsupervised learning, we would have the majority of the data to work with but would not know how to take advantage of the few labels we have.

The field of semisupervised learning blends the benefits of both supervised and unsupervised learning, taking advantage of the few labels that are available to uncover structure in a dataset and help label the rest.

We will continue to use the credit card transactions dataset in this chapter to showcase semisupervised learning.

Data Preparation

As before, let’s load in the necessary libraries and prepare the data. This should be pretty familiar by now:

'''Main'''
import numpy as np
import pandas as pd
import os, time, re
import pickle, gzip

'''Data Viz'''
import matplotlib.pyplot as plt
import seaborn as sns
color = sns.color_palette()
import matplotlib as mpl

%matplotlib inline

'''Data Prep and Model Evaluation'''
from sklearn ...

Get Hands-On Unsupervised Learning Using Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.