4 Segmenting Customers and Markets – Intuition Behind Clustering, Classification, and Language Analysis
As always, there’s good news and there’s bad news. The bad news is, we seem incapable of solving our more pressing or persistent problems. The good news is, we’re getting closer to building a machine that might do it for us.
– Jim Vibert, “If Artificial Intelligence Is the Answer, What’s the Question?,” The ChronicleHerald, January 1, 2018
Intuition Behind Clustering and Classification Algorithms
Let’s start from scratch. What’s the most basic thing intelligence is good for? Telling things apart, that’s what. If we have a bunch of data and we want to make sense out of it, the simplest thing we can do is make distinctions: put some of the stuff over here, and other stuff over there. More formally, we want to “partition” a collection of data items into groups or classes or boxes or buckets or bins or subsets or categories or whatever word you want to use to represent a top-level fundamental division.
That’s easy enough, but if we want to perform well at this we need some way of measuring how well we have done it. The basic concept here is called “metric” or “distance function.” It means that, given two things, you have a number that represents how close together or far apart they are. We will denote the distance between two data points a and b by the notation d(a,b). This is a mathematical function that, in order to qualify as a metric, has to satisfy the following four ...