O'Reilly logo

Mastering Machine Learning with Spark 2.x by Michal Malohlava, Max Pumperla, Alex Tellez

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Clustering

Given a graph, a natural question to ask is if there are any subgraphs to it that naturally belong together, that is, that cluster the graph in some way. This question can be addressed in many ways, one of which we have already implemented ourselves, namely by studying connected components. Instead of using our own implementation, let's use GraphX's built-in version this time. To do so, we can simply call connectedComponents directly on the graph itself:

val actorComponents = actorGraph.connectedComponents().cache actorComponents.vertices.map(_._2).distinct().count

As in our own implementation, the vertex data of the graph contains cluster IDs, which correspond to the minimum available vertex ID within the cluster. This allows ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required