Errata

Data Wrangling with Python

Errata for Data Wrangling with Python

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
PDF
Page 49
last paragraph

"However, if our code was in a subfolder called data,"
Replace the word 'code' with 'data'.

Note from the Author or Editor:
should read

However, if our code was in a subfolder called code

instead of

"However, if our code was in a subfolder called data,"

Ron B  Feb 17, 2016  Jan 27, 2017
Printed
Page 76
2nd paragraph

Paragraph states:
"From this folder, type the following command in your terminal to run the script from the command line:

python parse_script.py"


I think it is meant to say:
"python parse_excel.py"

since that is what you called the new python file in the prior paragraph (step # 2):

"2. Create a new Python file called parse_excel.py and put in the folder you created. "

Note from the Author or Editor:
Yes, it should say:

python parse_excel.py

Bryan P  Mar 09, 2016  Jan 27, 2017
PDF
Page 94
2nd paragraph

"This code prints the first two lines of the file"

Replace 'lines' with 'pages'

Note from the Author or Editor:
This code prints the first two lines of the file.

Please update to:

This code prints the first two pages of the file.

Ron B  Feb 17, 2016  Jan 27, 2017
Printed, PDF
Page 94
Last paragraph

Missing sudo command in code:

pip install --upgrade -- ignoreinstalled slate==0.3 pdfminer==20110515

should read (at least for my Mac):

sudo pip install --upgrade -- ignoreinstalled slate==0.3 pdfminer==20110515

Note from the Author or Editor:
If you are using a virtual environment, you can simply type:

pip install --upgrade -- ignoreinstalled slate==0.3 pdfminer==20110515

Otherwise, use:

sudo pip install --upgrade -- ignoreinstalled slate==0.3 pdfminer==20110515

zenzontle  Mar 12, 2016 
PDF
Page 94
Warning Text Box

In Warning box text, the pip install option "--ignoreinstalled" should be "--ignore-installed".

Windows 7
Python version 2.7.11
pip 8.1.1

Note from the Author or Editor:
Yes, it should be

pip install .... --ignore-installed

Anonymous  Mar 29, 2016  Jan 27, 2017
Printed
Page 94
Code line no. 3

After installing slate and pdfminer using recommended method after ImportError, it still is not possible to process PDF in sample zip.

The error message is : PDFSyntaxError: No /Root object! - Is this really a PDF?

This was solved at : https://stackoverflow.com/questions/11384591/parsing-a-pdf-with-no-root-object-using-pdfminer/11438571 in the last solution on the page; the open statement should use options 'rb'.

So code line 3 reads:

with open(pdf, 'rb') as f:

Windows 7
python 2.7.11
slate 0.3
pdfminer 20110515

PDF Book
February 2016: First Edition
Revision History for the First Edition
2016-02-02 First Release

Note from the Author or Editor:
Unsure if this is a windows-only issue, but regardless opening as 'rb' should be standard protocol, so let's change it:

with open(pdf, 'rb') as f:

Anonymous  Mar 29, 2016  Jan 27, 2017
ePub
Page 116
last paragraph

Part of the paragraph reads:

"...can be done simply by running pip install pdftables and pip requests install."

Should read

"can be done simply by running pip install pdftables and pip install requests."

Note from the Author or Editor:
As noted, please change: pip requests install

to

pip install requests

Deb R.H.  May 08, 2016  Jan 27, 2017
Printed
Page 301
2nd paragraph

the URL https://enoughproject.org/take-action brings up https://enoughproject.org/get-involved/take-action with what appears to be a different HTML structure, so in following pages some of the code returns errors. For instance, on P. 301 the code in paragraphs 15 and 16 throws AttributeError: object has no attribute "'descendants"

Note from the Author or Editor:
Hi there,

This is indeed the case! You can find old versions of the website pages and code in the code repository. https://github.com/jackiekazil/data-wrangling That should allow you to follow along with the book with old copies of the page. Unfortunately, as you probably know from reading the chapter, the web is constantly changing and this means scraping content is a never ending job. Hope this helps!

-katharine

John Roby  Sep 06, 2017 
Printed
Page 399
second code snippet, second line of code, footnoted as '1'

The line

`from emojispider.items import EmojispiderItem`

should read

`from scrapyspider.items import EmojiSpiderItem`,

as per the Github example: https://github.com/jackiekazil/data-wrangling/blob/master/code/chp12-scraping/scrapyspider/scrapyspider/spiders/emo_spider.py

Note from the Author or Editor:
This is correct, we should change

`from emojispider.items import EmojispiderItem`

should read

`from scrapyspider.items import EmojiSpiderItem`,

Anonymous  Jun 30, 2016  Jan 27, 2017
PDF
Page 439
5th paragraph

The output of the GCC compilers is machine code, NOT byte code. GCC does
not need to be installed to use the CPython interpreter or PyPy JIT to turn
python code into bytecode or machine code, respectively.

GCC would be needed to compile Cython code to machine code.
Cython is not used anywhere in this book.

Note from the Author or Editor:
Please update this sentence:

"The purpose of GCC (the GNU Compiler Collection) is to take code written in
Python and turn it into something your machine can understand—byte code."

to the following

The purpose of GCC (the GNU Compiler Collection) is to take Python libraries with C extensions and turn it into something your machine can understand and execute.

Ron B  Mar 02, 2016  Jan 27, 2017
PDF
Page 445
6th paragraph

Based on the instructions given above, ~/Projects has a single sub-directory called data_wrangling. It contains the 'code' subfolder, while 'envs' is in /home/_user's_name_.

Note from the Author or Editor:
Please update the sentence:

At this point, if we look at the contents of our Projects folder, we should have two
empty subfolders called code and envs.

To read:

At this point, we have our code folder set up in a special file inside our Projects folder and our virtual environment folder properly set up in our home directory.

Ron B  Mar 02, 2016  Jan 27, 2017