Mahalanobis Distance

The Mahalanobis distance is a distance measure that accounts for the covariance or "stretch" of the space in which the data lies. If you know what a Z-score is then you can think of the Mahalanobis distance as a multidimensional analogue of the Z-score. Figure 13-4(a) shows an initial distribution between three sets of data that make the vertical sets look closer together. When we normalize the space by the covariance in the data, we see in Figure 13-4(b) that that horizontal data sets are actually closer together. This sort of thing occurs frequently; for instance, if we are comparing people's height in meters with their age in days, we'd see very little variance in height to relate to the large variance in age. By normalizing for the variance we can obtain a more realistic comparison of variables. Some classifiers such as K-nearest neighbors deal poorly with large differences in variance, whereas other algorithms (such as decision trees) don't mind it.

We can already get a hint for what the Mahalanobis distance must be by looking at Figure 13-4;[239] we must somehow divide out the covariance of the data while measuring distance. First, let us review what covariance is. Given a list X of N data points, where each data point may be of dimension (vector length) K with mean vector μx (consisting of individual means μ1,…,K), the covariance is a K-by-K matrix given by:

where E[·] is the expectation operator. OpenCV makes computing the covariance matrix easy, using ...

Get Learning OpenCV now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.