5 Geometry in Data Science

In this chapter, we’ll explore several tools from geometry: we’ll look at distance metrics and their use in k-nearest neighbor algorithms; we’ll discuss manifold learning algorithms that map high-dimensional data to potentially curved lower-dimensional manifolds; and we’ll see how to apply fractal geometry to stock market data. The motivation for this chapter follows, among other things, from the manifold hypothesis, which posits that real-world data often has a natural dimensionality lower than the dimensionality of the dataset collected. In other words, a dataset that has 20 variables (that is, a dimensionality ...

Get The Shape of Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.