In this chapter, we’ll quickly cover different methods for graph processing and the most common platform approaches. We’ll look more closely at the two platforms used in this book, Apache Spark and Neo4j, and when they may be appropriate for different requirements. Platform installation guidelines are included to prepare you for the next several chapters.
Graph analytical processing has unique qualities such as computation that is structure-driven, globally focused, and difficult to parse. In this section we’ll look at the general considerations for graph platforms and processing.
There’s debate as to whether it’s better to scale up or scale out graph processing. Should you use powerful multicore, large-memory machines and focus on efficient data structures and multithreaded algorithms? Or are investments in distributed processing frameworks and related algorithms worthwhile?
A useful evaluation approach is the Configuration that Outperforms a Single Thread (COST), as described in the research paper “Scalability! But at What COST?” by F. McSherry, M. Isard, and D. Murray. COST provides us with a way to compare a system’s scalability with the overhead the system introduces. The core concept is that a well-configured system using an optimized algorithm and data structure can outperform current general-purpose scale-out solutions. It’s a method for measuring performance ...