Errata

Data Science for Business

Errata for Data Science for Business

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Printed Page 18
example 2-8 handling rows with a result proxy

first_row[cookies.c.cookie_name]

generates the error
AttributeError: Could not locate column in row for column 'cookies'

To make it work it was necessary to change the line to:
print(
f"access column by column object: {getattr(first_row, cookies.c.cookie_name.key)}")

Jim Keith  Feb 23, 2024 
Printed Page 20
example 2-10


s = cookies.select([cookies.c.cookie_name, cookies.c.quantity])

This statement generates an error

TypeError: FromClause.select() takes 1 positional argument but 2 were given

I cannot figure out how to get around this error

Jim Keith  Feb 23, 2024 
PDF Page 54
1st paragraph, 2nd line of the equation

This submission is to correct/clarify the errata submission by Adler Santos on Oct 25, 2015. The reader proposed a correction to the information gain equation, which was adopted by the author, but the original equation is correct.

When calculating information gain, each "probability" is the proportion of instances belonging to each child. There are 13 instances in the child "Balance < 50K" and 30 instances in the parent, so p(Balance < 50K) = 13/30 = 0.43. There are 17 instances in the other child, so p(Balance >= 50K) = 17/30 = 0.57.

YJ  Sep 11, 2020 
Printed Page 77
Figure 3-18

On the left side of the tree (after the OVERAGE attribute test), the attribute LEFTOVER appears twice as an interior node. On the right side of the tree (after the INCOME attribute test), the attribute OVERAGE appears twice as an interior node. Are these repeated attributes supposed to be other attributes?

On a different note: In Figure 3-10 and Figure 3-15, the interior nodes are rectangular and the terminal nodes are oval, while in Figure 3-18 and Figure 4-12, it is the opposite. I don't know if there is a standard in the industry, but it would be nice to see consistency in the tree representations to help with learning the basic concepts.

YJ  Sep 11, 2020 
Printed Page 120
Figure 5-4 description

The second line in the description for Figure 5-4 says, "In this case, both linear regression and a support vector machine learn the same model."

It should say "logistic regression" instead of "linear regression."

YJ  Sep 14, 2020 
Printed Page 121
3rd paragraph, last sentence

Hi,

"Overall, the (b) tree will have a total expected error rate of 30%..."

The accuracy of the tree in (b) appears to be 5/8, so the error rate should be 3/8=37.5%. To see this, the possible correct classifications and their probabilities are:

(C1,p,r) = 1/2 x 3/4 x 1/2 = 3/16 (correct C1 classification)
(C2,p,s) = 1/2 x 1/4 x 1/2 = 1/16 (correct C2 classification)
(C2,q, r or s) = 1/2 x 3/4 = 6/16 (correct C2 classification)

So the accuracy is 10/16, i.e., the error rate is 6/16=3/8=37.5%.

The previous statement about the spurious branch causing "one in eight errors made by the tree" also appears to be incorrect. The spurious branch causes an error when (C1,p,s) occurs, which happens with probability 3/16. So one out of two errors made by the tree are caused by the spurious branch.

David Brown  Feb 09, 2023 
Printed Page 124
2nd paragraph

The sentence reads "...and indeed we see that in the data SAMPLE both of y's values occur in both classes equally" (emphasis mine). This is not correct and two paragraphs later it's stated correctly. I assume the authors meant to say that in the _population_ both of y's values occur in both classes equally.

Bjorn Commers  May 10, 2022 
PDF Page 202
calculation of expected profit

The expected profit calculation assumes that we do not send the offer if the prediction is "No" (figure 7-4). However, the 99-1 payoff ratio is so great that it is optimal to send the offer to everybody for an expected value of 54.4. I recommend the use of a decision tree to explain this example.

Panos Markopoulos  Dec 29, 2021 
Printed Page 212
3rd paragraph

The cost matrix has an error on column (N,p).
In case of a False Negative, the model classifies as False a consumer who would order if he receives an offer. On this case, the cost is -4$ and not 0$ as stated.

Olus KAYACAN  May 02, 2020 
Printed Page 257
2nd paragraph

"to the documents is does occur in" should be "to the documents it does occur in."

YJ  Oct 02, 2020 
Printed Page 284
3rd paragraph

"data mininig process" should be "data mining process"

YJ  Oct 13, 2020 
Printed Page 295
Last paragraph

"agreableness" should be "agreeableness"

YJ  Oct 20, 2020 
Printed Page 308
2nd paragraph

"(subject to regularization, as discussed in Chapter 4)" — regularization is discussed in Chapter 5, not Chapter 4

YJ  Oct 20, 2020 
Printed Page 328
1st paragraph in "Flaws in the Big Red Proposal" section

"Big Data's proposal" should be "Big Red's proposal"

YJ  Oct 27, 2020 
Printed Page 330
2nd paragraph not counting the inset note

"propose an investments" should be "propose investments" or "propose an investment" (probably the former given the rest of the sentence)

YJ  Oct 27, 2020