O'Reilly logo

Webbots, Spiders, and Screen Scrapers by Michael Schrenk

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Adding the Payload

The payload used by this spider is an extension of the library used in Chapter 8 to download all the images found on a web page. This time, however, we'll download all the images referenced by the entire website. The code that adds the payload to the spider is shown in Listing 18-7. You can tack this code directly onto the end of the script for the earlier spider.

 # Add the payload to the simple spider // Include download and directory creation lib include("LIB_download_images.php"); // Download images from pages referenced in $spider_array for($penetration_level=1; $penetration_level<=$MAX_PENETRATION; $penetration_level++) { for($xx=0; $xx<count($spider_array[$previous_level]); $xx++) { download_images_for_page($spider_array[$previous_level][$xx]); ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required