Dimensionality reduction

What algorithms such as MinHash and LSH aim to do is reduce the quantity of data that must be stored without compromising on the essence of the original. They're a form of compression and they define helpful representations that preserve our ability to do useful work. In particular, MinHash and LSH are designed to work with data that can be represented as a set.

In fact, there is a whole class of dimensionality-reducing algorithms that will work with data that is not so easily represented as a set. We saw, in the previous chapter with k-means clustering, how certain data could be most usefully represented as a weighted vector. Common approaches to reduce the dimensions of data represented as vectors are principle component ...

Get Clojure for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.