Chapter 50. Application: A Face Detection Pipeline
This part of the book has explored a number of the central concepts and
algorithms of machine learning. But moving from these concepts to a
real-world application can be a challenge. Real-world datasets are noisy
and heterogeneous; they may have missing features, and data may be in a
form that is difficult to map to a clean [n_samples, n_features]
matrix. Before applying any of the methods discussed here, you must
first extract these features from your data: there is no formula for how
to do this that applies across all domains, and thus this is where you
as a data scientist must exercise your own intuition and expertise.
One interesting and compelling application of machine learning is to images, and we have already seen a few examples of this where pixel-level features are used for classification. Again, the real world data is rarely so uniform, and simple pixels will not be suitable: this has led to a large literature on feature extraction methods for image data (see Chapter 40).
In this chapter we will take a look at one such feature extraction technique: the histogram of oriented gradients (HOG), which transforms image pixels into a vector representation that is sensitive to broadly informative image features regardless of confounding factors like illumination. We will use these features to develop a simple face detection pipeline, using machine learning algorithms and concepts we’ve seen throughout this part of the book. ...