book

Graph Algorithms

by Mark Needham, Amy E. Hodler

May 2019

Intermediate to advanced

265 pages

5h 58m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

What’s in This BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
What Are Graphs?What Are Graph Analytics and Algorithms?Graph Processing, Databases, Queries, and AlgorithmsOLTP and OLAPWhy Should We Care About Graph Algorithms?Graph Analytics Use CasesConclusion
TerminologyGraph Types and StructuresRandom, Small-World, Scale-Free StructuresFlavors of GraphsConnected Versus Disconnected GraphsUnweighted Graphs Versus Weighted GraphsUndirected Graphs Versus Directed GraphsAcyclic Graphs Versus Cyclic GraphsSparse Graphs Versus Dense GraphsMonopartite, Bipartite, and k-Partite GraphsTypes of Graph AlgorithmsPathfindingCentralityCommunity DetectionSummary
Graph Platform and Processing ConsiderationsPlatform ConsiderationsProcessing ConsiderationsRepresentative PlatformsSelecting Our PlatformApache SparkNeo4j Graph PlatformSummary
Example Data: The Transport GraphImporting the Data into Apache SparkImporting the Data into Neo4jBreadth First SearchBreadth First Search with Apache SparkDepth First SearchShortest PathWhen Should I Use Shortest Path?Shortest Path with Neo4jShortest Path (Weighted) with Neo4jShortest Path (Weighted) with Apache SparkShortest Path Variation: A*Shortest Path Variation: Yen’s k-Shortest PathsAll Pairs Shortest PathA Closer Look at All Pairs Shortest PathWhen Should I Use All Pairs Shortest Path?All Pairs Shortest Path with Apache SparkAll Pairs Shortest Path with Neo4jSingle Source Shortest PathWhen Should I Use Single Source Shortest Path?Single Source Shortest Path with Apache SparkSingle Source Shortest Path with Neo4jMinimum Spanning TreeWhen Should I Use Minimum Spanning Tree?Minimum Spanning Tree with Neo4jRandom WalkWhen Should I Use Random Walk?Random Walk with Neo4jSummary
Example Graph Data: The Social GraphImporting the Data into Apache SparkImporting the Data into Neo4jDegree CentralityReachWhen Should I Use Degree Centrality?Degree Centrality with Apache SparkCloseness CentralityWhen Should I Use Closeness Centrality?Closeness Centrality with Apache SparkCloseness Centrality with Neo4jCloseness Centrality Variation: Wasserman and FaustCloseness Centrality Variation: Harmonic CentralityBetweenness CentralityWhen Should I Use Betweenness Centrality?Betweenness Centrality with Neo4jBetweenness Centrality Variation: Randomized-Approximate BrandesPageRankInfluenceThe PageRank FormulaIteration, Random Surfers, and Rank SinksWhen Should I Use PageRank?PageRank with Apache SparkPageRank with Neo4jPageRank Variation: Personalized PageRankSummary
Example Graph Data: The Software Dependency GraphImporting the Data into Apache SparkImporting the Data into Neo4jTriangle Count and Clustering CoefficientLocal Clustering CoefficientGlobal Clustering CoefficientWhen Should I Use Triangle Count and Clustering Coefficient?Triangle Count with Apache SparkTriangles with Neo4jLocal Clustering Coefficient with Neo4jStrongly Connected ComponentsWhen Should I Use Strongly Connected Components?Strongly Connected Components with Apache SparkStrongly Connected Components with Neo4jConnected ComponentsWhen Should I Use Connected Components?Connected Components with Apache SparkConnected Components with Neo4jLabel PropagationSemi-Supervised Learning and Seed LabelsWhen Should I Use Label Propagation?Label Propagation with Apache SparkLabel Propagation with Neo4jLouvain ModularityWhen Should I Use Louvain?Louvain with Neo4jValidating CommunitiesSummary
Analyzing Yelp Data with Neo4jYelp Social NetworkData ImportGraph ModelA Quick Overview of the Yelp DataTrip Planning AppTravel Business ConsultingFinding Similar CategoriesAnalyzing Airline Flight Data with Apache SparkExploratory AnalysisPopular AirportsDelays from ORDBad Day at SFOInterconnected Airports by AirlineSummary
Machine Learning and the Importance of ContextGraphs, Context, and AccuracyConnected Feature EngineeringGraphy FeaturesGraph Algorithm FeaturesGraphs and Machine Learning in Practice: Link PredictionTools and DataImporting the Data into Neo4jThe Coauthorship GraphCreating Balanced Training and Testing DatasetsHow We Predict Missing LinksCreating a Machine Learning PipelinePredicting Links: Basic Graph FeaturesPredicting Links: Triangles and the Clustering CoefficientPredicting Links: Community DetectionSummaryWrapping Things Up

Other AlgorithmsNeo4j Bulk Data Import and YelpAPOC and Other Neo4j ToolsFinding DatasetsAssistance with the Apache Spark and Neo4j PlatformsTraining

Content preview from Graph Algorithms

Foreword

What do the following things all have in common: marketing attribution analysis, anti-money laundering (AML) analysis, customer journey modeling, safety incident causal factor analysis, literature-based discovery, fraud network detection, internet search node analysis, map application creation, disease cluster analysis, and analyzing the performance of a William Shakespeare play. As you might have guessed, what these all have in common is the use of graphs, proving that Shakespeare was right when he declared, “All the world’s a graph!”

Okay, the Bard of Avon did not actually write graph in that sentence, he wrote stage. However, notice that the examples listed above all involve entities and the relationships between them, including both direct and indirect (transitive) relationships. Entities are the nodes in the graph—these can be people, events, objects, concepts, or places. The relationships between the nodes are the edges in the graph. Therefore, isn’t the very essence of a Shakespearean play the active portrayal of entities (the nodes) and their relationships (the edges)? Consequently, maybe Shakespeare could have written graph in his famous declaration.

What makes graph algorithms and graph databases so interesting and powerful isn’t the simple relationship between two entities, with A being related to B. After all, the standard relational model of databases instantiated these types of relationships in its foundation decades ago, in the entity relationship diagram ...