Chapter 16. Boolean Search
In this chapter I present a solution to the previous exercise. Then you will write code to combine multiple search results and sort them by their relevance to the search terms.
Crawler Solution
First, let’s go over our solution to the previous exercise. I provided an outline of WikiCrawler
; your job was to fill in crawl
. As a reminder, here are the fields in the WikiCrawler
class:
public class WikiCrawler { // keeps track of where we started private final String source; // the index where the results go private JedisIndex index; // queue of URLs to be indexed private Queue<String> queue = new LinkedList<String>(); // fetcher used to get pages from Wikipedia final static WikiFetcher wf = new WikiFetcher(); }
When we create a WikiCrawler
, we provide source
and index
. Initially, queue
contains only one element, source
.
Notice that the implementation of queue
is a LinkedList
, so we can add elements at the end—and remove them from the beginning—in constant time. By assigning a LinkedList
object to a Queue
variable, we limit ourselves to using methods in the Queue
interface; specifically, we’ll use offer
to add elements and poll
to remove them.
Here’s my implementation of WikiCrawler.crawl
:
public String crawl(boolean testing) throws IOException { if (queue.isEmpty()) { return null; } String url = queue.poll(); System.out.println("Crawling " + url); if (testing==false && index.isIndexed(url)) { System.out.println("Already indexed."); return null; } Elements paragraphs; ...
Get Think Data Structures now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.