Classifying text

In our previous section, we discussed cluster, which was an unsupervised learning algorithm. Classification, on the other hand, is a supervised learning algorithm. What does supervised and unsupervised mean? In our previous example, we had the labels or the truth values. This is information about which class or label a document actually belongs to. But you would have also noticed we never used this information. When we trained our model, we never used the labels. This kind of learning is called unsupervised learning, and clustering is a popular example of an unsupervised learning task.

In classification problems, we are aware of the classes which we want to assign documents or data points to, and we use this information to ...

Get Natural Language Processing and Computational Linguistics now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.