Errata

Errata for Data Science for Business

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
Printed	Page 18 example 2-8 handling rows with a result proxy	first_row[cookies.c.cookie_name] generates the error AttributeError: Could not locate column in row for column 'cookies' To make it work it was necessary to change the line to: print( f"access column by column object: {getattr(first_row, cookies.c.cookie_name.key)}")	Jim Keith	Feb 23, 2024
Printed	Page 20 example 2-10	s = cookies.select([cookies.c.cookie_name, cookies.c.quantity]) This statement generates an error TypeError: FromClause.select() takes 1 positional argument but 2 were given I cannot figure out how to get around this error	Jim Keith	Feb 23, 2024
PDF	Page 54 1st paragraph, 2nd line of the equation	This submission is to correct/clarify the errata submission by Adler Santos on Oct 25, 2015. The reader proposed a correction to the information gain equation, which was adopted by the author, but the original equation is correct. When calculating information gain, each "probability" is the proportion of instances belonging to each child. There are 13 instances in the child "Balance < 50K" and 30 instances in the parent, so p(Balance < 50K) = 13/30 = 0.43. There are 17 instances in the other child, so p(Balance >= 50K) = 17/30 = 0.57.	YJ	Sep 11, 2020
Printed	Page 77 Figure 3-18	On the left side of the tree (after the OVERAGE attribute test), the attribute LEFTOVER appears twice as an interior node. On the right side of the tree (after the INCOME attribute test), the attribute OVERAGE appears twice as an interior node. Are these repeated attributes supposed to be other attributes? On a different note: In Figure 3-10 and Figure 3-15, the interior nodes are rectangular and the terminal nodes are oval, while in Figure 3-18 and Figure 4-12, it is the opposite. I don't know if there is a standard in the industry, but it would be nice to see consistency in the tree representations to help with learning the basic concepts.	YJ	Sep 11, 2020
Printed	Page 120 Figure 5-4 description	The second line in the description for Figure 5-4 says, "In this case, both linear regression and a support vector machine learn the same model." It should say "logistic regression" instead of "linear regression."	YJ	Sep 14, 2020
Printed	Page 121 3rd paragraph, last sentence	Hi, "Overall, the (b) tree will have a total expected error rate of 30%..." The accuracy of the tree in (b) appears to be 5/8, so the error rate should be 3/8=37.5%. To see this, the possible correct classifications and their probabilities are: (C1,p,r) = 1/2 x 3/4 x 1/2 = 3/16 (correct C1 classification) (C2,p,s) = 1/2 x 1/4 x 1/2 = 1/16 (correct C2 classification) (C2,q, r or s) = 1/2 x 3/4 = 6/16 (correct C2 classification) So the accuracy is 10/16, i.e., the error rate is 6/16=3/8=37.5%. The previous statement about the spurious branch causing "one in eight errors made by the tree" also appears to be incorrect. The spurious branch causes an error when (C1,p,s) occurs, which happens with probability 3/16. So one out of two errors made by the tree are caused by the spurious branch.	David Brown	Feb 09, 2023
Printed	Page 124 2nd paragraph	The sentence reads "...and indeed we see that in the data SAMPLE both of y's values occur in both classes equally" (emphasis mine). This is not correct and two paragraphs later it's stated correctly. I assume the authors meant to say that in the _population_ both of y's values occur in both classes equally.	Bjorn Commers	May 10, 2022
PDF	Page 202 calculation of expected profit	The expected profit calculation assumes that we do not send the offer if the prediction is "No" (figure 7-4). However, the 99-1 payoff ratio is so great that it is optimal to send the offer to everybody for an expected value of 54.4. I recommend the use of a decision tree to explain this example.	Panos Markopoulos	Dec 29, 2021
Printed	Page 212 3rd paragraph	The cost matrix has an error on column (N,p). In case of a False Negative, the model classifies as False a consumer who would order if he receives an offer. On this case, the cost is -4$ and not 0$ as stated.	Olus KAYACAN	May 02, 2020
Printed	Page 257 2nd paragraph	"to the documents is does occur in" should be "to the documents it does occur in."	YJ	Oct 02, 2020
Printed	Page 284 3rd paragraph	"data mininig process" should be "data mining process"	YJ	Oct 13, 2020
Printed	Page 295 Last paragraph	"agreableness" should be "agreeableness"	YJ	Oct 20, 2020
Printed	Page 308 2nd paragraph	"(subject to regularization, as discussed in Chapter 4)" — regularization is discussed in Chapter 5, not Chapter 4	YJ	Oct 20, 2020
Printed	Page 328 1st paragraph in "Flaws in the Big Red Proposal" section	"Big Data's proposal" should be "Big Red's proposal"	YJ	Oct 27, 2020
Printed	Page 330 2nd paragraph not counting the inset note	"propose an investments" should be "propose investments" or "propose an investment" (probably the former given the rest of the sentence)	YJ	Oct 27, 2020