Special spider functions are found in the
LIB_simple_spider library. This library provides functions that parse links from a web page when given a URL, archive harvested links in an array, identify the root domain for a URL, and identify links that should be excluded from the archive.
This library, as well as the other scripts featured in this chapter, is available for download at this book's website.
Figure 18-2. Running the simple spider from Listings 18-1 and 18-2
harvest_links() function downloads the specified web page and returns all the links in an array. This function, shown in Listing 18-3, uses the