Applied Text Analysis with Python

Errata for Applied Text Analysis with Python

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
chapter 4
First code listing after heading "Pipeline Basics".

Here is the error when the code runs in Python:

TypeError: Last step of Pipeline should implement fit. 'MultinomialNB()' (type <class 'str'>) doesn't

Remove the two quote marks surrounding the call to MultinomialNB.

Anonymous  Mar 16, 2018 
PDF Page 13
4th paragraph

"Co-occurrences show which words are likely to proceed and succeed each other..."

'proceed' should be 'precede':

"Co-occurrences show which words are likely to precede and succeed each other..."

Amar  Dec 24, 2018 
Printed, PDF Page 31
chapter 2

The errata pages here refer to completely different chapter names, and a different BOOK as far as I can tell. For example, in the real book chapter 2 is called "Building a Custom Corpus", not "Text Acquisition and Ingestion".

I came here trying to understand why the extension in the DOC_PATTERN regex is json and not html in the "Reading an HTML Corpus".

Anyway, the sample doesn't work.

Anonymous  Nov 06, 2018 
Printed Page 106
Model evaluation(Chinese version)

for X_train,X_test,y_train,y_test in loader: (X_train, y_train), what type of parameters are accepted here? ?. it is wrong to run according to this URL

zeng sir  Jan 21, 2021 
Printed Page 143
4th paragraph

calls to grigram_counts.ngrams[3] and trigram_counts.ngrams[3].conditions() seem to be calling on an instance of FreqDist() when they should be calling on the instance of ConditionalFreqDist which is trigram_counts.allgrams. Looking at the nltk documentation the FreqDist doesn't have a conditions() method which makes me think the book is incorrect.

Also the the text on the page says it is retrieving conditional frequency information. The code on page 142 seems to be recording that in trigram_counts.allgrams not trigram_counts.ngrams but the calls on page 143 retrieve it from trigram_counts.ngrams

thank you

Anonymous  Jun 23, 2019