July 2017
Beginner to intermediate
715 pages
17h 3m
English
Now that we have a basic understanding of web crawlers, we are ready to create our own. In this simple web crawler, we will keep track of the pages visited using ArrayList instances. In addition, jsoup will be used to parse a web page and we will limit the number of pages we visit. Jsoup (https://jsoup.org/) is an open source HTML parser. This example demonstrates the basic structure of a web crawler and also highlights some of the issues involved in creating a web crawler.
We will use the SimpleWebCrawler class, as declared here:
public class SimpleWebCrawler { private String topic; private String startingURL; private String urlLimiter; private final int pageLimit = 20; private ArrayList<String> visitedList ...