The approach we take to graph analysis evolves as we become more familiar with the behavior of different algorithms on specific datasets. In this chapter, we’ll run through several examples to give you a better feeling for how to tackle large-scale graph data analysis using datasets from Yelp and the US Department of Transportation. We’ll walk through Yelp data analysis in Neo4j that includes a general overview of the data, combining algorithms to make trip recommendations, and mining user and business data for consulting. In Spark, we’ll look into US airline data to understand traffic patterns and delays as well as how airports are connected by different airlines.
Because pathfinding algorithms are straightforward, our examples will use these centrality and community detection algorithms:
PageRank to find influential Yelp reviewers and then correlate their ratings for specific hotels
Betweenness Centrality to uncover reviewers connected to multiple groups and then extract their preferences
Label Propagation with a projection to create supercategories of similar Yelp businesses
Degree Centrality to quickly identify airport hubs in the US transport dataset
Strongly Connected Components to look at clusters of airport routes in the US
Yelp helps people find local businesses based on reviews, preferences, and recommendations. Over 180 million reviews had been written on the platform as of the end of ...