Perl & LWP by Sean M. Burke This errata page lists errors outstanding in the most recent printing. If you have technical questions or error reports, you can send them to booktech@oreilly.com. Please specify the printing date of your copy. This page was updated March 28, 2008. Here's a key to the markup: [page-number]: serious technical mistake {page-number}: minor technical mistake : important language/formatting problem (page-number): language change or minor formatting problem ?page-number?: reader question or request for clarification NOTE FROM THE AUTHOR Thanks for buying my book! I've gotten really enthusiastic responses from readers, and that has made all the work of writing absolutely worth it. If you're having trouble getting any code from the book to work, the absolute first thing you should do is make sure you've got a recent version of LWP installed! Here's my favorite incantation for seeing what version you have: perl -e "use LWP 1000000000" It will say something like: LWP version 1000000000 required--this is only version 5.68 at -e line 1. BEGIN failed--compilation aborted at -e line 1. If the version number you see in "this is only version 5.68" is lower that 5.68, upgrade! /Perl & LWP/ is not about old versions of LWP, but just about modern versions -- the more modern the better, since we're constantly improving its performance and interface. If you're using an old version, you're missing out on years of improvements that Gisle, me, and many others have added for you. Just to pick two little examples: in older versions, you would load the class HTTP::Cookies::Netscape not with the expected "use HTTP::Cookies::Netscape" line, but with "use HTTP::Cookies". Moreover, old versions didn't understand cookies files from recent Mozilla versions. A more compelling example is that in old LWP versions, LWP::UserAgent had no $browser->get or $browser->post methods -- and this book uses those "new" methods heavily, because the alternative is a much less friendly syntax: use HTTP::Request::Common; $browser->request(GET(...),...); and the like. Besides the issue of LWP versions, there is also the question of brittleness. SECOND NOTE FROM THE AUTHOR: I said in Chapter 1, in the section on "Brittleness": "As somewhat of a lesson in brittleness, in this book I show you data on various web sites (Amazon.com, the BBC News web site, and many others) and show how to write programs to extract data from them. However, that code is fragile. Some sites get redesigned only every few years; Amazon.com seems to change something every few weeks. So while I've made every effort to have the code be accurate for the web sites as they exist at the time of this writing, I hope you will consider the programs in this book valuable as learning tools even after the sites they communicate with will have changed beyond recognition." Well, even though it's been only a few weeks since the book went to press, already many of the sites have changed enough to break some of the extractor programs that are examples in the book. With some sites (like Amazon), that was expected -- it was just a matter of happening sooner rather than later. With others (like the California DMV server, or the Weather Underground), I'm a bit surprised that the changes happened so soon. In some of the program files at http://examples.oreilly.com/perllwp/ I have added a few comments noting where some of the screen-scraper programs have already broken because of changes in the site that they pull from. I leave it as an exercise to readers to try on their own to get some of those extractors working. It'll be good practice in retuning brittle programs! After all, when you write your extractors from stratch, they'll eventually break too. -- Sean M. Burke August 1, 2002 Confirmed errors: (xi) Under the heading "Foreword", add a subheading "by Gisle Aas" (xiv) Second line; Correct http://www.w3.org/TR/html401/interact/forms/ to http://www.w3.org/TR/html401/interact/forms (just removing the final "/") {7}, Table 1-1 Several things wrong with the table contents. Here it is, all fixed: Distribution CPAN directory Minimal Version Needed libwww-perl modules/by-module/Net 5.68 URI modules/by-module/URI 1.23 libnet modules/by-module/Net 1.12 HTML-Tagset modules/by-module/HTML 3.03 HTML-Parser modules/by-module/HTML 3.27 HTML-Tree modules/by-module/HTML 3.17 MIME-Base64 modules/by-module/MIME 2.16 Compress-Zlib modules/by-module/Compress 1.19 Digest-MD5 modules/by-module/Digest 2.23 (7) end of first paragraph after the table After "get the latest." add a new sentence: "This book is about the latest version of LWP! Upgrade now!" {7} in the paragraph before heading "Unpack and configure" Change both instances of authors/id/G/GA/GAAS to modules/by-module/MIME (11) example 1-1 change "use LWP::Simple;" to "use LWP::Simple 1.36;" (11) example 1-2 change "use LWP;" to "use LWP 5.58;" (12) example 1-3 change "use LWP::UserAgent;" to "use LWP::UserAgent 2.003;" (13) example 1-5 change "use HTML::TreeBuilder;" to "use HTML::TreeBuilder 3.13;" (14) end of the chapter text: Add a new section with a heading "Upgrade Now!" with this as the text: If you're having trouble getting any code from the book to work, the absolute first thing you should do is make sure you've got a recent version of LWP installed! Here's my favorite incantation for seeing what version you have: perl -e "use LWP 1000000000" It will say something like: LWP version 1000000000 required--this is only version 5.68 at -e line 1. BEGIN failed--compilation aborted at -e line 1. If the version number you see in "this is only version 5.68" is lower that 5.68, upgrade! This book is not about old versions of LWP, but just about modern versions -- the more modern the better, since we're constantly improving its performance and interface. If you're using an old version, you're missing out on years of improvements that Gisle, me, and many others have added for you. Just to pick two examples: in older LWP versions, you would load the class HTTP::Cookies::Netscape not with the expected "use HTTP::Cookies::Netscape" line, but with "use HTTP::Cookies". Moreover, old versions didn't understand cookies files from recent Mozilla versions. A more compelling example is that in old LWP versions, LWP::UserAgent had no $browser->get or $browser->post methods -- and this book uses those methods heavily, because the alternative is a much less friendly syntax: use HTTP::Request::Common; $browser->request(GET(...),...); and the like. {16} The examples at the bottom of page 16 and the top of 17 mistakenly show "+" separating form=value pairs. It should be "&"! So: Take this: name=Hiram%20Veeblefeetzer+age=35+country=Madagascar and correct to: name=Hiram%20Veeblefeetzer&age=35&country=Madagascar And later, take this: $query = "name=$n+age=$a+country=$c"; print $query; name=Hiram%20Veeblefeetzer+age=35+country=Madagascar and correct it to: $query = "name=$n&age=$a&country=$c"; print $query; name=Hiram%20Veeblefeetzer&age=35&country=Madagascar (17) the second line from the bottom; Correct to (19) Last sentence; The code font is used for the word "print" in the last sentence on page 19: "Some functions return the document, others save or print the document." Since this is _not_ a reference to a function, the word "print" should not use the code font. (24) example 2-5 change "use LWP;" to "use LWP 5.58;" {28}, 1st new paragraph, 7th line down on the page. Take this: (doc, status, success, resp) = do_GET(URL, [form_ref, [headers_ref]]); And correct it to: (doc, status, success, resp) = do_POST(URL, [form_ref, [headers_ref]]); (31) third code line change "use LWP;" to "use LWP 5.58;" (32) 2nd paragraph, second line: There's slightly too much space after the comma in "a firemwall, or" {37} Fifth (non-code) paragraph, second sentence: Correct to: If this $browser object has a protocols_allowed list (and most don't), then is_protocol_supported returns true only for protocols that are in that list, and which LWP supports. But if $browser object is normal in not having a protocols_allowed list, then is_protocol_supported returns true for any protocol that LWP supports and which isn't in protocols_forbidden. {40} second codeblock, fifth and sixth lines Correct both instances of "$response" to "$resp". {40} six lines from the bottom; Correct my $resp = $browser->get('http://www.perl.com' to my $resp = $browser->get('http://www.perl.com', (just adding a comma to the end) {41} first line of first new codeblock; Correct my $resp = $browser->get('http://www.perl.com/' to my $resp = $browser->get('http://www.perl.com/', (just adding a comma to the end) {41} first line of second-to-last codeblock; Correct my $resp = $browser->get('http://www.perl.com/' to my $resp = $browser->get('http://www.perl.com/', (just adding a comma to the end) {43} The first line of the 2nd and 5th code examples under "Status Line" Correct "$resp = " to "$response = ". {45} second-to-last codeblock; Correct the line: $mins = int($age/60); $age -= $minutes * 60; to $mins = int($age/60); $age -= $mins * 60; (47) Third line of the last paragraph: Correct "LWP::Cookies" to "HTTP::Cookies" (48) first code line change "use URI;" to "use URI 1.23;" {51} last line, second from last paragraph; reads: userinfo, server, or port components). should be: userinfo, host, or port components). Also, on the NEXT line, "server()" should be "host()" (53) The third line of each of the first two code sections: Correct "$uri->" to "$url->". (Look close!) (59) fifth line, paragraph after code sample; reads: city, state, and a submit button called zip. should be: city, state, zip, and a submit button called Search (59) second to last paragraph, next to last line; reads: call to call should be: call (61) example 5-1 change "use URI;" to "use URI 1.23;" {75} Second-from-last line of the codeblock; "ds" => "30", should be: "ds" => "100", (85) first code line change "use LWP::Simple;" to "use LWP::Simple 1.36;" {97} about a dozen lines down; Correct this line: die "Couldn't get $doc_url: ", $resp->status_line to: die "Couldn't get $doc_url: ", $response->status_line {97} 14th non-blank codeline, right under "{ # Get..." Take $doc_url = $response->request->base; and correct it to: $doc_url = $response->base; {98, 99} All the code in this section: Sorry, this code doesn't work anymore -- Weather Underground has changed their HTML at least twice since the book went to press. You're on your own with getting it to work -- and (the hard part) KEEPING it working. {105} second "#" line in the codeblock in the middle of the page. Correct this comment line: # process the text in $text->[1] to this: # process the text in $token->[1] (105) third code line change "use HTML::TokeParser;" to "use HTML::TokeParser 2.24;" {106} Next-to-last paragraph; Book reads: Should be (with alt attribute) BC1998!  WHOOO! (111) Start of new paragraph in the middle of the page; Clarify If you though the contents of $url could be very large, to If you thought the content in $resp could be very large, (120) 3rd paragraph, 2nd sentence; "actual" should be "actually" (122) three lines under the new heading "First Code" change "use HTML::TokeParser;" to "use HTML::TokeParser 2.24;" (126) first line under the heading "Debuggability" correct "all the links" to "all the lines" (134) example 9-2 change "use HTML::TreeBuilder 3;" to "use HTML::TreeBuilder 3.13;" (135) Parse Options Section; Two incidents of mistake. A) In example beginning with $comments = $root->strict_comment(); last statement incorrectly reads $comments = $root->strict_comments(); -- Omit 's' B) Two paragraphs later beginning with $root->strict_comments([boolean]); Incorrect with 's' again, should read $root->strict_comment([boolean]); (140) Second paragraph under "Traversing" heading; Correct The traverse() method lets you both: to The traverse() method lets you do both: {144} first codeblock; Add before the first line: use URI; {144} about eight lines into the first codeblock; Correct if(@children == 1 and ref $children[0] and $children[0]->tag eq 'a') to if(@children == 1 and ref $children[0] and $children[0]->tag eq 'a') { (Just adding a "{" at the end) (149) example 10-1 change "use HTML::TreeBuilder;" to "use HTML::TreeBuilder 3.13;" {149} Example 10-1 (lines 11 and 16 of code); The attribute should be 'class' not 'style'. The value should be 'mutter' not 'mumble' (cf. p. 148). Take $h3c->attr('style', 'scream'); and correct to $h3c->attr('class', 'scream'); And take $h3r->attr('style', 'mumble'); and correct to: $h3r->attr('class', 'mutter'); And in the dump on page 150, the bolded line, correct the second and third bolded lines to:

@0.1.0 and to

@0.1.1 (151) Second line of the last paragraph; Correct "There's no point is" to "There's no point in" (152) HTML example, sixth-from-last line; Take and correct to: (166) last codeblock change "use HTTP::Cookies;" to "use HTTP::Cookies 1.30;" (174) 2nd line of first codeblock: Take $browser->name('ReportsBot/1.01'); and correct to $browser->agent('ReportsBot/1.01'); (175) the first paragraph's fifth line; Correct is password protected to is password-protected (178) 1st paragraph, 4th line; "writing" should be "write" (181) second code line change "use LWP::RobotUA;" to "use LWP::RobotUA 1.19;" (187) the last codeblock's first line; Correct my %notable_url_error; # URL => error messageS to my %notable_url_error; # URL => error message (Just removing the "S") {195} Take my $hit_limit = $option{'h'} || 500; and correct it to: my $hit_limit = $option{'n'} || 500; [i.e. 'n' for 'network hits', not 'h' for 'help'] (196) about 3/5ths of the way thru the codeblock; Correct sub report { # This that gets run at the end. to sub report { # This gets run at the end. {220} entry for character 982; The character should not be an uppercase pi, but instead should be a lowercase pi that looks like an omega with a crossbar -- just like TeX \varpi -- as seen at http://interglacial.com/~sburke/varpi.gif (228) four lines from the bottom; Correct it's term (how long to its term (how long (Just deleting the apostrophe)