Chapter 14

Leveraging Similarity

IN THIS CHAPTER

Understanding differences between examples

Clustering data into meaningful groups

Classifying and regressing after looking for data neighbors

Grasping the difficulties of working in a high-dimensional data space

A rose is a rose. A tree is a tree. A car is a car. Even though you can make simple statements like this, one example of each kind of item doesn’t suffice to identify all the items that fit into that classification. After all, many species of trees and many kinds of roses exist. If you evaluate the problem under a machine learning framework in the examples, you find features whose values change frequently and features that somehow systematically persist (a tree is always made of wood and has a trunk and roots, for instance). When you look closely for the features’ values that repeat constantly, you can guess that certain observed objects are of much the same kind.

So, children can figure out by themselves what cars are by looking at the features. After all, cars all have four wheels and run on roads. But what happens when a child sees a bus or a truck? Luckily, someone is there to explain the big cars and open the child’s world to larger definitions. In this chapter, you explore how machines can learn by exploiting similarity in

  • A supervised way: Learning from previous examples. For example, a car has four wheels, therefore if a new object has four wheels, it could be a car.
  • An unsupervised way: Inferring a grouping without ...

Get Machine Learning For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.