Errata

Building Machine Learning Powered Applications

Errata for Building Machine Learning Powered Applications

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
PDF
Page X
Penultimate paragraph

“ If you already work as a data scientist or ML engineer, this book will add new techniques to your ML development tool.” Appears to be a typo, possibly tool should be toolkit?

Note from the Author or Editor:
This is a typo indeed, good catch!

"If you already work as a data scientist or ML engineer, this book will add new techniques to your ML development tool."

should be

If you already work as a data scientist or ML engineer, this book will add new techniques to your ML development toolkit.

Richard Morton  Feb 06, 2020  Feb 14, 2020
Printed, PDF, ePub, Mobi, , Other Digital Version
Page ***
Code Example

In Chapter 4, Acquire an initial Dataset. There is an error on the code example.

********************************************************************************************
questions_with_accepted_answers = df[
df["is_question"] & ~(df["AcceptedAnswerId"].isna())
]
q_and_a = questions_with_accepted_answers.join(
df[["Text"]], on="AcceptedAnswerId", how="left", rsuffix="_answer"
)
*******************************************************************************************

df[['Text']] shall have been df[['body_text']]. There is no 'Text' in the df.info().

Note from the Author or Editor:
df[["Text"]] should be changed to df[["body_text"]]

Chris Chen  Mar 21, 2020 
Page 63
In the code block

Second release:

At 5th line in the code block, df[["Text"]] should be df[["body_text"]].
At 8th line in the code block, q_and_a[["Text", "Text_answer"]] should be q_and_a[["body_text", "body_text_answer"]].

Thank you.

Note from the Author or Editor:
Errata confirmed, this should be changed to match the code that is provided with the book

Haesun Park  Jul 13, 2021 
, Printed, PDF, ePub, Mobi, , Other Digital Version
Page 71
Second to last paragraph

"five" in five most popular ones should be "seven"

Because we have more than three hundred tags in our dataset, here we chose to only create a column for the five most popular ones

Emmanuel Ameisen
Emmanuel Ameisen
 
Sep 30, 2020 
Printed, PDF, ePub, Mobi, , Other Digital Version
Page 87
Table 4-5

The text on last paragraph of page 86 describes crossing features as multiplying them, this means on the table in page 87, i expect DoW x DoM = Cross column. However DoW in the table is 7,7,...,1 down the rows. It should be 6,6,...7. Only then when multiplied by DoM of 29,29,...,30 gives 174,174,...210.



Note from the Author or Editor:
This is correct, there is a slight inaccuracy in the figure. It currently reads

7 | 29 | 174
7 | 29 | 174
...
1 | 30 | 210

It should be

6 | 29 | 174
6 | 29 | 174
...
7 | 30 | 210

Han Qi  Mar 19, 2021 
Page 88
list items

Second release:

In 2nd bullet, has_question should be question_mark.
In 3rd bullet, is_language_question should be language_question.

Thank you

Note from the Author or Editor:
Errata confirmed, the field names should be changes

Haesun Park  Jul 14, 2021 
Page 101
2nd paragraph from the bottom

Second release:

In 9th line from the bottom, writers.stackoverflow.com should be writers.stackexchange.com

Thank you

Note from the Author or Editor:
Errata is confirmed, site URL should be changed

Haesun Park  Jul 14, 2021 
Page 105
12th line from the top

Second release:

12th line from the top, "Stack Overflow" should be "Stack Exchange"

Thank you

Note from the Author or Editor:
Correct, we should make the suggested change

Haesun Park  Jul 14, 2021 
Printed, PDF, ePub, Mobi, , Other Digital Version
Page 114
2nd paragraph

Text explains calibration curve as "gives a probability of being classified as positive that is higher than 80%".

This makes students think the upper bound is 100%, whereas in reality, the intuitive understanding is a bucket of size 10% and the y-axis is calculated from observations within a bucket of predicted probability of 80-90% and not the 80-100% as the language (higher than 80%) in the book implies.

The above error would be easier to identify if a 10% example was given, then it would be obviously wrong that only 10% of the observations with predicted probability of 10-100% are actually correct. Anyway, this was how i reasoned that something is wrong before actually researching more to find a better definition. I'm actually not sure what the upper bound or bucket size should be, maybe 80-90% is still not narrow enough for a bucket.




Note from the Author or Editor:
This is an imprecise use of language here. The paragraph currently reads:

"For example, out of all the data points our classifier gives a probability of being classified as positive that is higher than 80%, how many of those data points are actually positive?"

It should read:

"For example, out of all the data points our classifier gives a probability of being classified as positive that is close to 80%, how many of those data points are actually positive?"

Han Qi  Mar 18, 2021 
Page 129
Figure 6-2

Second release:

In 3rd plot of Firgure 6-2, 'Model can fit unseen data' should be 'Model can predict unseen data'

Thank you

Note from the Author or Editor:
The Errata is correct, this should be changed as suggested

Haesun Park  Jul 14, 2021 
Page 132
Figure 6-3

Second release:

In figure 6-3, 'Format to model specification', 'Cleaning', 'Feature Generation' should be rearraged like 'Cleaning'-->'Feature Generation'-->'Format to model specification'

Thank you

Note from the Author or Editor:
This report is correct, and the figure should be corrected (I am happy to help).

The italics above the arrows in figure 6-3 should be reorganized. More specifically, the 2nd, 3rd and 4th are in the wrong order. Copying from the Errata:
'Format to model specification', 'Cleaning', 'Feature Generation' should be rearraged to
'Cleaning'-->'Feature Generation'-->'Format to model specification'

Haesun Park  Jul 14, 2021 
Page 151
8th line from the bottom

Second release:

In 8th line from the bottom, 'perform well on a training test' should be 'perform well on a training set'.

Thank you

Note from the Author or Editor:
Errata is correct, we should change as requested

Haesun Park  Jul 14, 2021 
Page 157
Fist code block

Second release:

In last line of the first code block, 'positive_probs = clf[:, 1]' should be 'positive_probs = probabilities[:, 1]'

Thank you

Note from the Author or Editor:
This is a nice catch. While the code in notebooks is correct, this reproduction is wrong.

We should change:

# probabilities is an array containing one probability per class
probabilities = clf.predict_proba(features)
# Positive probas contains only the score of the positive class
positive_probs = clf[:,1]

to:

# probabilities is an array containing one probability per class
probabilities = clf.predict_proba(features)
# Positive probas contains only the score of the positive class
positive_probs = probabilities[:,1]

Haesun Park  Jul 14, 2021 
Page 164
2nd paragraph from the bottom

Second release:

In first line of 2nd paragraph, 'the body of the question' should be 'the body of the function'

Thank you

Note from the Author or Editor:
This errata is correct

Haesun Park  Jul 14, 2021 
, Printed, PDF, ePub, Mobi, , Other Digital Version
Page 197
Top of the page

"outputs" should be "inputs"

To prevent a model from running on incorrect outputs, we need to detect that these

Emmanuel Ameisen
Emmanuel Ameisen
 
Sep 30, 2020 
Page 223
2nd line from the bottom

Second release:

In 2nd line from the bottom, 'evaluating a mode' should be 'evaluating a model'

Thank you

Note from the Author or Editor:
The errata is correct

Haesun Park  Jul 14, 2021 
, Printed, PDF, ePub, Mobi, , Other Digital Version
Page 232
left side of index

"continuous improvemen" should be "continuous integration"

CI/CD (continuous improvement/continuous delivery)

Emmanuel Ameisen
Emmanuel Ameisen
 
Sep 30, 2020