O'Reilly logo

Building a Recommendation Engine with Scala by Saleem Ansari

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Clustering

Clustering is a form of unsupervised learning where the task of the learning algorithm is to find some structure in the given dataset. In particular, a notion of similarity or distance among different instances of a dataset is used to learn such clusters. Spark provides K-Means, expectation-maximization (EM), power iteration clustering (PIC), Latent Dirichlet Allocation (LDA), and streaming K-Means.

K-Means

K-Means is one of the most popular clustering algorithms in which we pre-determine the parameter K—the number of clusters. Or in a more formal way we can define K-Means as a prototype-based, partitional clustering technique that attempts to find a user-specified number of clusters (K) which are represented by their centroids. In ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required