Chapter 3. Graph Platforms and Processing

In this chapter, we’ll quickly cover different methods for graph processing and the most common platform approaches. We’ll look more closely at the two platforms used in this book, Apache Spark and Neo4j, and when they may be appropriate for different requirements. Platform installation guidelines are included to prepare you for the next several chapters.

Graph Platform and Processing Considerations

Graph analytical processing has unique qualities such as computation that is structure-driven, globally focused, and difficult to parse. In this section we’ll look at the general considerations for graph platforms and processing.

Platform Considerations

There’s debate as to whether it’s better to scale up or scale out graph processing. Should you use powerful multicore, large-memory machines and focus on efficient data structures and multithreaded algorithms? Or are investments in distributed processing frameworks and related algorithms worthwhile?

A useful evaluation approach is the Configuration that Outperforms a Single Thread (COST), as described in the research paper “Scalability! But at What COST?” by F. McSherry, M. Isard, and D. Murray. COST provides us with a way to compare a system’s scalability with the overhead the system introduces. The core concept is that a well-configured system using an optimized algorithm and data structure can outperform current general-purpose scale-out solutions. It’s a method for measuring performance ...

Get Graph Algorithms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.