Web Scraping with Python

Errata for Web Scraping with Python

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Safari Books Online

There are no page numbers in Safari!!

It's chapter 10 in the section titled "Handling Logins and Cookies"

This link gives a 404 error

It appears in several places, starting in paragraph 4

Your form handler for ERRATA will not let me enter the URL directly.
This is amazingly awful in 2020.

so maybe if I put a space in it will go through?

http: //

Note from the Author or Editor:
Fixed all links. Looks like the printed/displayed URL text was correct, but they didn't actually link to anything on click.

BRIAN WILSON  Jun 11, 2020  Aug 28, 2020
Page 17
Middle of page where variable `html` is defined

The URL should be the one for instead of

Anonymous  Feb 23, 2019  Aug 28, 2020
Page 35
2nd paragraph

In the regular expression ^(/wiki/)((?!:).)*$") the ") at the end is not need and may be due to copying from the code.

Anonymous  Apr 18, 2020  Aug 28, 2020
Page 54
def scrapeNYTimes(url) function

NYtimes appear to have changed the class, "story-content" no longer works. Replacing with "css-1cy1v93" (the p-class) does work but not sure if this is the most robust method.

Anonymous  May 19, 2018  Nov 21, 2018