22 A Short Section on Optimal Transport
The Euclidean distance between points in point clouds might not be a reliable measure for proximity if, for instance, the point clouds are not approximately aligned or the point clouds are noisy. To establish better correspondences, many have looked into improving the similarity measures by defining features/descriptors for point cloud data that not only capture the location of a point, but also encode the local geometry of the object in the vicinity of that point (e.g., curvature). These methods then rely on a nearest neighbor matching in the feature space instead of the raw input space. Moreover, to obtain robustness and avoid false correspondences, these methods often use the Random Sample Consensus (RANSAC) algorithm or its variations. Unfortunately, however, RANSAC significantly increases the computational cost of the registration algorithm as it requires running the correspondence problem multiple times for different random subsets of the data [6].
While extremely successful for some applications, most nonlinear dimension reduction methods embedd data in Euclidean space assuming that Euclidean distances between data points are geometrically and pratically meaningful. Also, the manifold hypothesis is often assumed. See Chapter 4. In vision classification and signal identification translated copies of a single image for example can have large Euclidean distance even though they are semantically identical and should be assigned the ...