Clustering

Given a graph, a natural question to ask is if there are any subgraphs to it that naturally belong together, that is, that cluster the graph in some way. This question can be addressed in many ways, one of which we have already implemented ourselves, namely by studying connected components. Instead of using our own implementation, let's use GraphX's built-in version this time. To do so, we can simply call connectedComponents directly on the graph itself:

val actorComponents = actorGraph.connectedComponents().cache actorComponents.vertices.map(_._2).distinct().count

As in our own implementation, the vertex data of the graph contains cluster IDs, which correspond to the minimum available vertex ID within the cluster. This allows ...

Get Mastering Machine Learning with Spark 2.x now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.