Until now, we have viewed supervised learning and unsupervised learning as two separate and distinct branches of machine learning. Supervised learning is appropriate when our dataset is labeled, and unsupervised learning is necessary when our dataset is unlabeled.
In the real world, the distinction is not quite so clear. Datasets are usually partially labeled, and we want to efficiently label the unlabeled observations while leveraging the information in the labeled set. With supervised learning, we would have to toss away the majority of the dataset because it is unlabeled. With unsupervised learning, we would have the majority of the data to work with but would not know how to take advantage of the few labels we have.
The field of semisupervised learning blends the benefits of both supervised and unsupervised learning, taking advantage of the few labels that are available to uncover structure in a dataset and help label the rest.
We will continue to use the credit card transactions dataset in this chapter to showcase semisupervised learning.
As before, let’s load in the necessary libraries and prepare the data. This should be pretty familiar by now:
'''Data Prep and Model Evaluation'''