Unsupervised Clustering with Apache Spark 2.0

In this chapter, we will cover:

  • Building a KMeans classification system in Spark 2.0
  • Bisecting KMeans, the new kid on the block in Spark 2.0
  • Using Gaussian Mixture and Expectation Maximization (EM) in Spark 2.0 to classify data
  • Classifying the vertices of a graph using Power Iteration Clustering (PIC) in Spark 2.0
  • Using Latent Dirichlet Allocation (LDA) to classify documents and text into topics
  • Streaming KMeans to classify data in near real time

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.