Chapter 4Classification
Understand the modeling premises implicit in the tools you use. And, if you don’t, understand that, too.
— Mark Jacobson, personal communication, 2006
The statistician cannot evade the responsibility for understanding the process he applies or recommends.
— R. A. Fisher, The Design of Experiments, 1971
This chapter introduces the reader to a wide variety of approaches to the problem of classification, that is, to the problem of supervised learning when the range of the unknown function
is a discrete, unordered set of labels. It begins in Section 4.1 by developing an optimal (minimum risk) classifier, the Bayes classifier, under the assumption that the joint probability distribution
from which data are drawn is known. This assumption is totally unrealistic, of course, but analyzing the Bayes classifier allows us to perceive and appreciate the role played by our subjectivity, encoded in the loss function
, and the roles played by various aspects of the joint probability distribution
.
The practical part of the chapter begins in Section 4.2, which contains ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access