## Analysis of Graph Algorithms

The order of growth for a graph algorithm is usually expressed as a function of , the number of vertices, and , the number of edges.

The order of growth for BFS is , which is a convenient way to say that the runtime grows in proportion to either or , whichever is “bigger.”

To see why, think about these four operations:

Adding a vertex to the queue

This happens once for each vertex, so the total cost is in .

Removing a vertex from the queue

This happens once for each vertex, so the total cost is in .

Marking a vertex “visited”

This happens once for each vertex, so the total cost is in .

Checking whether a vertex is marked

This happens once each edge, so the total cost is in .

Adding them up, we get . If we know the relationship between and , we can simplify this expression. For example, in a regular graph with degree k, the number of edges is in , so BFS is linear in . In a complete graph, the number of edges is in , so BFS is quadratic in .

Of course, this analysis is based on the assumption that all four operations—adding and removing vertices, marking and checking marks—are constant time.

Marking vertices is easy. You can add an attribute to the `Vertex` objects or put the marked ones in a set and use the `in` operator.

But making a first-in-first-out (FIFO) queue that can add and remove vertices in constant time is not as easy as it sounds.

## FIFO Implementation

A FIFO is a data structure that provides the following operations:

append

Add a new item to the end of the queue.

pop

Remove and return the item at the front of the queue.

There are several good implementations of this data structure. One is the doubly-linked list, which you can read about at http://en.wikipedia.org/wiki/Doubly-linked_list. Another is a circular buffer, which you can read about at http://en.wikipedia.org/wiki/Circular_buffer.

Example 4-1.

Write an implementation of a FIFO using either a doubly-linked list or a circular buffer.

Yet another possibility is to use a Python dictionary and two indices: `nextin` keeps track of the back of the queue, and `nextout` keeps track of the front. The dictionary maps from integer indices to values.

Here is an implementation based on Raymond Hettinger’s recipe at http://code.activestate.com/recipes/68436/:

```class DictFifo(object):

def __init__(self):
self.nextin = 0
self.nextout = 0
self.data = {}

def append(self, value):
self.data[self.nextin] = value
self.nextin += 1

def pop(self, n=-1):
value = self.data.pop(self.nextout)
self.nextout += 1
return value```

`append` stores the new item and increments `nextin`; both operations are constant time.

`pop` removes the last item and increments `nextout`. Again, both operations are constant time.

As yet another alternative, the Python `collections` module provides an object called a `deque`, which stands for “double-ended queue.” It is supposed to be pronounced “deck,” but many people say “deek.” A Python deque can be adapted to implement a FIFO.

You can read about deques at http://en.wikipedia.org/wiki/Deque and get the details of the Python implementation at http://docs.python.org/lib/deque-objects.html.

Example 4-2.

The following implementation of a BFS contains two performance errors. What are they? What is the actual order of growth for this algorithm?

```def bfs(top_node, visit):
"""Breadth-first search on a graph, starting at top_node."""
visited = set()
queue = [top_node]
while len(queue):
curr_node = queue.pop(0)    # Dequeue
visit(curr_node)            # Visit the node
visited.add(curr_node)

# Enqueue non-visited and non-enqueued children
queue.extend(c for c in curr_node.children
if c not in visited and c not in queue)```

Test this code with a range of graph sizes and check your analysis. Then use a FIFO implementation to fix the errors and confirm that your algorithm is linear.

## Stanley Milgram

Stanley Milgram was an American social psychologist who conducted two of the most famous experiments in social science: the Milgram experiment, which studied people’s obedience to authority (http://en.wikipedia.org/wiki/Milgram_experiment), and the small world experiment, which studied the structure of social networks (http://en.wikipedia.org/wiki/Small_world_phenomenon).

In the small world experiment, Milgram sent a package to several randomly chosen people in Wichita, Kansas with instructions asking them to forward an enclosed letter to a target person, identified by name and occupation, in Sharon, Massachusetts (which is the town near Boston where I grew up). The subjects were told that they could mail the letter directly to the target person only if they knew him personally; otherwise, they were instructed to send it, and the same instructions, to a relative or friend who they thought would be more likely to know the target person.

Many of the letters were never delivered, but for the ones that were, the average path length—the number of times the letters were forwarded—was about six. This result was taken to confirm previous observations (and speculations) that the typical distance between any two people in a social network is about “six degrees of separation.”

This conclusion is surprising because most people expect social networks to be localized—people tend to live near their friends—and in a graph with local connections, path lengths tend to increase in proportion to geographical distance. For example, most of my friends live nearby, so I would guess that the average distance between nodes in a social network is about 50 miles. Wichita is about 1600 miles from Boston, so if Milgram’s letters traversed typical links in the social network, they should have taken 32 hops, not 6.

## Watts and Strogatz

In 1998, Duncan Watts and Steven Strogatz published a paper in Nature, “Collective dynamics of ‘small-world’ networks,” that proposed an explanation for the small world phenomenon. You can download it from http://www.nature.com/nature/journal/v393/n6684/abs/393440a0.html.

Watts and Strogatz started with two kinds of graph that were well understood: random graphs and regular graphs. They looked at two properties of these graphs: clustering and path length.

Clustering

This is a measure of the “cliquishness” of the graph. In a graph, a clique is a subset of nodes that are all connected to each other; in a social network, a clique is a set of friends who all know each other. Watts and Strogatz defined a clustering coefficient that quantifies the likelihood that two nodes that are connected to the same node are also connected to each other.

Path length

This is a measure of the average distance between two nodes, which corresponds to the degrees of separation in a social network.

Their initial result was what you might expect: regular graphs have high clustering and high path lengths, and random graphs with the same size tend to have low clustering and low path lengths. So neither of these is a good model of social networks, which seem to combine high clustering with short path lengths.

Their goal was to create a generative model of a social network. A generative model tries to explain a phenomenon by modeling the process that builds or leads to the phenomenon. In this case, Watts and Strogatz proposed a process for building small-world graphs:

1. Start with a regular graph with n nodes and degree k. Watts and Strogatz start with a ring lattice, which is a kind of regular graph. You could replicate their experiment or try instead a graph that is regular but not a ring lattice.

2. Choose a subset of the edges in the graph and “rewire” them by replacing them with random edges. Again, you could replicate the procedure described in the paper or experiment with alternatives.

The proportion of edges that are rewired is a parameter, p, that controls how random the graph is. With , the graph is regular; with , it is random.

Watts and Strogatz found that small values of p yield graphs with high clustering, like a regular graph, and low path lengths, like a random graph.

Example 4-3.

Read the Watts and Strogatz paper, and answer the following questions:

1. What process do Watts and Strogatz use to rewire their graphs?

2. What is the definition of the clustering coefficient ?

3. What is the definition of the average path length ?

4. What real-world graphs did Watts and Strogatz look at? What evidence do they present that these graphs have the same structure as the graphs generated by their model?

Example 4-4.

Create a file named `SmallWorldGraph.py`, and define a class named `SmallWorldGraph` that inherits from `RandomGraph`.

If you did Example 2-4, you can use your own `RandomGraph.py`; otherwise, you can download mine from http://thinkcomplex.com/RandomGraph.py.

1. Write a method called `rewire` that takes a probability, `p`, as a parameter, and starting with a regular graph, rewires the graph using Watts and Strogatz’s algorithm.

2. Write a method called `clustering_coefficient` that computes and returns the clustering coefficient as defined in the paper.

3. Make a graph that replicates the line marked in Figure 2 of the Watts and Strogatz paper. In other words, confirm that the clustering coefficient drops off slowly for small values of p.

Before we can replicate the other line, we have to learn about shortest path algorithms.

## Dijkstra

Edsger W. Dijkstra was a Dutch computer scientist who invented an efficient shortest path algorithm (see http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm). He also invented the semaphore, which is a data structure used to coordinate programs that communicate with each other (see http://en.wikipedia.org/wiki/Semaphore_(programming) and my book, The Little Book of Semaphores).

Dijkstra is famous (and notorious) as the author of a series of essays on computer science. Some, like “A Case Against the GO TO Statement,” have had a profound effect on programming practice. Others, like “On the Cruelty of Really Teaching Computing Science,” are entertaining in their cantankerousness, but less effective.

Dijkstra’s algorithm solves the “single-source shortest path problem,” which means that it finds the minimum distance from a given source node to every other node in the graph (or at least every connected node).

We start with a simplified version of the algorithm that considers all edges the same length. The more general version works with any non-negative edge lengths.

The simplified version is similar to the breadth-first search in Connected Graphs, except that instead of marking visited nodes, we label them with their distance from the source. Initially, all nodes are labeled with an infinite distance. Like a breadth-first search, Dijkstra’s algorithm uses a queue of discovered unvisited nodes:

1. Give the source node distance 0, and add it to the queue. Give the other nodes infinite distance.

2. Remove a vertex from the queue, and assign its distance to d. Find the vertices it is connected to. For each connected vertex with infinite distance, replace the distance with and add it to the queue.

3. If the queue is not empty, go back to step 2.

The first time you execute step 2, the only node in the queue has distance 0. The second time, the queue contains all nodes with distance 1. Once those nodes are processed, the queue contains all nodes with distance 2, and so on.

So when a node is discovered for the first time, it is labeled with the distance , which is the shortest path to that node. It is not possible that you will discover a shorter path later, because if there were a shorter path, you would have discovered it sooner. That is not a proof of the correctness of the algorithm, but it sketches the structure of the proof by contradiction.

In the more general case where the edges have different lengths, it is possible to discover a shorter path after you have discovered a longer path, so a little more work is needed.

Example 4-5.

Write an implementation of Dijkstra’s algorithm and use it to compute the average path length of a SmallWorldGraph.

Make a graph that replicates the line marked in Figure 2 of the Watts and Strogatz paper. Confirm that the average path length drops off quickly for small values of p. What is the range of values for p that yield graphs with high clustering and low path lengths?

Example 4-6.

A natural question about the Watts and Strogatz paper is whether the small world phenomenon is specific to their generative model or whether other similar models yield the same qualitative result (high clustering and low path lengths).

To answer this question, choose a variation of the Watts and Strogatz model and replicate their Figure 2. There are two kinds of variation you might consider:

• Instead of starting with a regular graph, start with another graph with high clustering. One option is a locally connected graph where vertices are placed at random locations in the plane and each vertex is connected to its nearest k neighbors.

• Experiment with different kinds of rewiring.

If a range of similar models yield similar behavior, we say that the results of the paper are robust.

Example 4-7.

To compute the average path length in a SmallWorldGraph, you probably ran Dijkstra’s single-source shortest path algorithm for each node in the graph. In effect, you solved the “all-pairs shortest path” problem, which finds the shortest path between all pairs of nodes.

1. Find an algorithm for the all-pairs shortest path problem and implement it. Compare the runtime with your “all-source Dijkstra” algorithm.

2. Which algorithm gives better order-of-growth runtime as a function of the number of vertices and edges? Why do you think Dijkstra’s algorithm does better than the order-of-growth analysis suggests?

## What Kind of Explanation Is That?

If you ask me why planetary orbits are elliptical, I might start by modeling a planet and a star as point masses. I would look up the law of universal gravitation at http://en.wikipedia.org/wiki/Newton’s_law_of_universal_gravitation and use it to write a differential equation for the motion of the planet. Then I would either derive the orbit equation or, more likely, look it up at http://en.wikipedia.org/wiki/Orbit_equation. With a little algebra, I could derive the conditions that yield an elliptical orbit. Then I would argue that the objects we consider planets satisfy these conditions.

People, or at least scientists, are generally satisfied with this kind of explanation. One of the reasons for its appeal is that the assumptions and approximations in the model seem reasonable. Planets and stars are not really point masses, but the distances between them are so big that their actual sizes are negligible. Planets in the same solar system can affect each others’ orbits, but the effect is usually small, and we ignore relativistic effects, again on the assumption that they are small.

This explanation is also appealing because it is equation-based. We can express the orbit equation in a closed form, which means that we can compute orbits efficiently. It also means that we can derive general expressions for the orbital velocity, orbital period, and other quantities.

Finally, I think this kind of explanation is appealing because it has the form of a mathematical proof. It starts from a set of axioms and derives the result by logic and analysis. But it is important to remember that the proof pertains to the model and not the real world. That is, we can prove that an idealized model of a planet yields an elliptical orbit, but we can’t prove that the model pertains to actual planets (in fact, it does not).

By comparison, Watts and Strogatz’s explanation of the small world phenomenon may seem less satisfying. First, the model is more abstract, which is to say less realistic. Second, the results are generated by simulation, not by mathematical analysis. Finally, the results seem less like a proof and more like an example.

Many of the models in this book are like the Watts and Strogatz model: abstract, simulation-based, and (at least superficially) less formal than conventional mathematical models. One of the goals of this book is to consider the questions these models raise:

• What kind of work can these models do: are they predictive, explanatory, or both?

• Are the explanations these models offer less satisfying than explanations based on more traditional models? Why?

• How should we characterize the differences between these and more conventional models? Are they different in kind or only in degree?

Over the course of the book, I will offer my answers to these questions, but they are tentative and sometimes speculative. I encourage you to consider them skeptically and reach your own conclusions.

Get Think Complexity now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.