O'Reilly logo

Think Data Structures by Allen B. Downey

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 16. Boolean Search

In this chapter I present a solution to the previous exercise. Then you will write code to combine multiple search results and sort them by their relevance to the search terms.

Crawler Solution

First, let’s go over our solution to the previous exercise. I provided an outline of WikiCrawler; your job was to fill in crawl. As a reminder, here are the fields in the WikiCrawler class:

public class WikiCrawler {
    // keeps track of where we started
    private final String source;

    // the index where the results go
    private JedisIndex index;

    // queue of URLs to be indexed
    private Queue<String> queue = new LinkedList<String>();

    // fetcher used to get pages from Wikipedia
    final static WikiFetcher wf = new WikiFetcher();
}

When we create a WikiCrawler, we provide source and index. Initially, queue contains only one element, source.

Notice that the implementation of queue is a LinkedList, so we can add elements at the end—and remove them from the beginning—in constant time. By assigning a LinkedList object to a Queue variable, we limit ourselves to using methods in the Queue interface; specifically, we’ll use offer to add elements and poll to remove them.

Here’s my implementation of WikiCrawler.crawl:

 public String crawl(boolean testing) throws IOException { if (queue.isEmpty()) { return null; } String url = queue.poll(); System.out.println("Crawling " + url); if (testing==false && index.isIndexed(url)) { System.out.println("Already indexed."); return null; } Elements paragraphs; ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required