Chapter 11

Beyond supervised and unsupervised learning


In many practical applications, labeled data is rare or costly to obtain. “Semisupervised” learning exploits unlabeled data to improve the performance of supervised learning. We first discuss how to combine clustering with classification; more specifically, how mixture model clustering using expectation maximization can be combined with Naïve Bayes to blend information from both labeled and unlabeled data. Next, we discuss how a generative approach for learning from unlabeled data, such as the one based on fitting a mixture model, can be combined with discriminative learning from labeled data. Following that, we consider the “cotraining” method for semisupervised learning, which can ...

Get Data Mining, 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.