O'Reilly logo

Think Data Structures by Allen B. Downey

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 7. Getting to Philosophy

The goal of this chapter is to develop a web crawler that tests the “Getting to Philosophy” conjecture, which I presented in “Search Engines”.

Getting Started

In the repository for this book, you’ll find some code to help you get started:

  1. WikiNodeExample.java contains the code from the previous chapter, demonstrating recursive and iterative implementations of depth-first search (DFS) in a DOM tree.

  2. WikiNodeIterable.java contains an Iterable class for traversing a DOM tree. I’ll explain this code in the next section.

  3. WikiFetcher.java contains a utility class that uses jsoup to download pages from Wikipedia. To help you comply with Wikipedia’s terms of service, this class limits how fast you can download pages; if you request more than one page per second, it sleeps before downloading the next page.

  4. WikiPhilosophy.java contains an outline of the code you will write for this exercise.

You’ll also find the Ant build file build.xml. If you run ant WikiPhilosophy, it will run a simple bit of starter code.

Iterables and Iterators

In the previous chapter, I presented an iterative depth-first search (DFS), and suggested that an advantage of the iterative version, compared to the recursive version, is that it is easier to wrap in an Iterator object. In this section we’ll see how to do that.

If you are not familiar with the Iterator and Iterable interfaces, you can read about them at http://thinkdast.com/iterator and http://thinkdast.com/iterable.

Take a look ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required