Community formation is common in all types of networks, and identifying them is essential for evaluating group behavior and emergent phenomena. The general principle in finding communities is that its members will have more relationships within the group than with nodes outside their group. Identifying these related sets reveals clusters of nodes, isolated groups, and network structure. This information helps infer similar behavior or preferences of peer groups, estimate resiliency, find nested relationships, and prepare data for other analyses. Community detection algorithms are also commonly used to produce network visualization for general inspection.
We’ll provide details on the most representative community detection algorithms:
Triangle Count and Clustering Coefficient for overall relationship density
Strongly Connected Components and Connected Components for finding connected clusters
Label Propagation for quickly inferring groups based on node labels
Louvain Modularity for looking at grouping quality and hierarchies
We’ll explain how the algorithms work and show examples in Apache Spark and Neo4j. In cases where an algorithm is only available in one platform, we’ll provide just one example. We use weighted relationships for these algorithms because they’re typically used to capture the significance of different relationships.
Figure 6-1 gives an overview of the differences between the community detection algorithms covered ...