Spidering Hacks by Kevin Hemenway, Tara Calishain This errata page lists errors outstanding in the most recent printing. If you have technical questions or error reports, you can send them to booktech@oreilly.com. Please specify the printing date of your copy. This page was updated February 13, 2006. Here's a key to the markup: [page-number]: serious technical mistake {page-number}: minor technical mistake : important language/formatting problem (page-number): language change or minor formatting problem ?page-number?: reader question or request for clarification Confirmed errors: (21) last paragraph; create something a little more advanced then "Hello, World" should read: create something a little more advanced than "Hello, World" [27] 3rd; install libwww-perl returns with from ppm (version 3.0.1) perl -v gives. "This is perl, v5.8.0 built for MSWin32-x86-multi-thread (with 1 registered patch, see perl -V for more detail) Copyright 1987-2002, Larry Wall Binary build 802 provided by ActiveState Corp. http://www.ActiveState.com Built 00:54:02 Nov 8 2002" solution: goto http://www.cpan.org/modules/INSTALL.html and follow instructions to install manually. AUTHOR: This is technically a mistake, but only because the user is using ppm3 and not ppm (which is version 2, as indicated in output of that hack). (28) last command exampe %perl -MWLP::Simple -e 'print join "\n", head "http://cpan.org/RECENT"'; should be: %perl -MWLP::Simple -e "print join '\n', head 'http://cpan.org/RECENT'"; (32) 2nd last paragraph, first word; h1=en should read: hl=en i.e. the letter "l", not the number "1". Ditto in the last paragraph, where h1 is also used. {45} Bold code at bottom of page: if (my $encoding = $response->content_encoding) ) { Should be if (my $encoding = $response->content_encoding) { There is an extra " ) " [50] foreach loop in Progress Bar script; The value of $final_data is not re-initialized upon each iteration of the foreach loop over the @ARGV array. Consequently, the value of length($final_data) is inflated for the second and subsequent URLs specified on the command line and the progress reported is incorrect. AUTHOR:This is correct. The script and hack was tested with only one file at a time, even though it supported more than one command line. [53] 2nd code fragment; In the example code, 2nd fragment, an extraneous character is indicated in the search string. It currently reads as follows: my @links = $p->look_down( _tag => 'a', href => qr{^ \Ohttp://www.oreilly.com/catalog/\E \w+ $}x ); and should read: my @links = $p->look_down( _tag => 'a', href => qr{^ \http://www.oreilly.com/catalog/\E \w+ $}x ); (53) Hack #19 Scraping with HTML::TreeBuilder The last para on this page referes to "O'Reilly's subscription-based Safari online Library http://safari.online.com Should be: http://safari.oreilly.com (63) 2nd paragraph Andy Lester's WWW::Mechanize [Hack #22] allows you to go to a URL and explore the sit... should say: Andy Lester's WWW::Mechanize [Hack #21] allows you to go to a URL and explore the sit... [169] google search code block; The entire doGoogleSearch call appears to be missing. [273] 3rd paragraph; Kurt Hindenburg's tvlisting no longer works and he closed development of it as per his website. (277) hack 74 introduction ...idea what you're visitor's weather is like. should be ...idea what your visitor's weather is like. [326] function getBlock (lower half of page); the variable $pElement is referenced in the getBlock function, but is never declared or assigned in this function. This appears to cause the function to fail under some conditions. if( $_start > strlen($pElement) && $_stop > $start ) } should read: if( $_start < strlen($pSource) && $_stop > $start ) } (364) 6 Hack #94 Using XML::RSS to Repurpose Data http://www.newsmonster.com should be http://www.newsmonster.org