O'Reilly logo

Clojure for Data Science by Henry Garner

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Dimensionality reduction

What algorithms such as MinHash and LSH aim to do is reduce the quantity of data that must be stored without compromising on the essence of the original. They're a form of compression and they define helpful representations that preserve our ability to do useful work. In particular, MinHash and LSH are designed to work with data that can be represented as a set.

In fact, there is a whole class of dimensionality-reducing algorithms that will work with data that is not so easily represented as a set. We saw, in the previous chapter with k-means clustering, how certain data could be most usefully represented as a weighted vector. Common approaches to reduce the dimensions of data represented as vectors are principle component ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required