Chapter 5. Graph Processing on Hadoop

In Chapter 3, we talked about tools for processing data with Hadoop, but there’s another class of data processing being done on Hadoop that we did not cover there: graph processing. We’ll discuss graph processing with Hadoop separately in this chapter, since it’s at the heart of many of the applications we use every day, and provides significant opportunities for implementing specific types of applications. Some common uses of graph processing are ranking pages in search engines, finding new friends on social networks, determining the underlying equities in our investment funds, planning routes, and much more.

What Is a Graph?

Before we go into the tools for graph processing, let’s step back and first define what a graph is, in case you are new to this topic. Then we will go over what graph processing is and distinguish it from graph querying.

Needless to say, a graph can mean multiple things. It can mean a line, pie, or bar graph in tools like Excel. One author’s second grader has to chart different-colored M&Ms on a bar graph, but that isn’t the type of graph we’re talking about in this chapter. The graphs we’re talking about here only contain two types of objects called vertices and edges.

As you might have been taught in grade school, the points of a triangle are called vertices, and the lines that connect those points are edges, as shown in Figure 5-1. To provide some context, this is a good place to start.

Figure 5-1. Edges and vertices ...

Get Hadoop Application Architectures now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.