What is classification?

Classification is one of the largest uses of data mining, both in practical use and in research. As before, we have a set of samples that represents objects or things we are interested in classifying. We also have a new array, the class values. These class values give us a categorization of the samples. Some examples are as follows:

  • Determining the species of a plant by looking at its measurements. The class value here would be Which species is this?.
  • Determining if an image contains a dog. The class would be Is there a dog in this image?.
  • Determining if a patient has cancer based on the test results. The class would be Does this patient have cancer?.

While many of the examples above are binary (yes/no) questions, they do not ...

Get Python: Real-World Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.