IN THIS CHAPTER
Understanding differences between examples
Clustering data into meaningful groups
Classifying and regressing after looking for data neighbors
Grasping the difficulties of working in a high-dimensional data space
A rose is a rose. A tree is a tree. A car is a car. Even though you can make simple statements like this, one example of each kind of item doesn’t suffice to identify all the items that fit into that classification. After all, many species of trees and many kinds of roses exist. If you evaluate the problem under a machine learning framework in the examples, you find features whose values change frequently and features that somehow systematically persist (a tree is always made of wood and has a trunk and roots, for instance). When you look closely for the features’ values that repeat constantly, you can guess that certain observed objects are of much the same kind.
So, children can figure out by themselves what cars are by looking at the features. After all, cars all have four wheels and run on roads. But what happens when a child sees a bus or a truck? Luckily, someone is there to explain the big cars and open the child’s world to larger definitions. In this chapter, you explore how machines can learn by exploiting similarity in