Doing Data Science

Errata for Doing Data Science

Submit your own errata for this product.


The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color Key: Serious Technical Mistake Minor Technical Mistake Language or formatting error Typo Question Note Update



Version Location Description Submitted By Date Submitted Date Corrected
Printed
Page multiple
multiple

This errata was submitted by Philipp Marek via email. Errata for Doing data science I mark /deletions/, and *changes*. This is in UTF8 -- so eg. a CRLF is shown as down-left pointing arrow: &#8629; xvii: move 3 words: there is more breath // than depth *in some cases* xxi: Forgot to mention "Visual Display of Quantitative Information" ... although listed on p37 2: statis/i/tican 14: 1-4 use different shades of gray, or dashes or something like that 30: observed real-world phenomen*a* (or *a* phenomenon) 32: x in seconds? Don't integrate over minutes 38: http://stat.columbia.edu - everything else on github 43: hypo-thesis, not th-esis (?) 2-3 Huma*n* behavi*or* (nouns) Trying to read associations fails; put Olympics beneath Olympic records? 44: an extension /of/ or variation of 48: an-swered? 49: Did Doug use ... (... "CPC") -- aren't used in text, no need to explain 50: plot(log(), log()): see http://spacecraft.ssl.umd.edu/akins_laws.html, twice. "6. (Mar's Law) Everything is linear if plotted log-log with a fat magic marker." bk.homes[which ...] -- indentation of 3rd line wrong log() <= 5 ... better use <= 1e5 or 100e3 and remove log() 68: 3-6 truth = d*e*gree 2 (top right) 69: x*_2* * x_3 71: x&#8321;�, not x�&#8321; 72: you'd have establish*ed* the bins (or have *to* establish) 73: 3-7 doesn't include the points listed above 74: 3-8 use "x" for new guy, this point is already in 3-7 76: Hamming: shoe +s-s => hose, distance is 2 we start with a Google search ... *which to use*. 77: n.points = length(data) Why not simply use a boolean vector of some length on data? swap lines: train <- and #define 78: swap cl <- and # swap true.labels and # 79: # We're using ... comment not helpful 85: http://abt.cm -- why a different link shortener? we showed how *to* explore and clean 87: remove line setwd() 90: U of Edinb*o*rough? 101: parallel/-/ly 108: WWW::Mechanize, and generally Perl for text extraction 111,112: script could use a few functions 117: *An Empirical...* format different from other book references or titles 129: "non discrete)" is still a comment, wrong format used c[, 2] - space before "," missing 131: vlist <- use less space to avoid line break, twice 132: "use holdout group" join to previous line "vars" within for loop? 137: prop/o/agates 140: 6-3 no colors visible. use distinguisable grays? 141: 6-4 no counts visible 147: what does 6-7 show? 151: 6-8 label both axes with text 155: 6-12 factors not distinguishable 156: this_E is unused 176: "Director of Research..." in one line 177: the modeling part isn't *what* we want 183: AIC Info*r*mation 184: a college studen ... spend *her* time 191: "column which is our response" is still a comment, has wrong format 194: "Google's Hybrid Approach" title => italic 201: simple but comp*l*ete 215: vr = indentation wrong 236: to a*c*cept 241: the Predicted=False row should have FN, TN 246: 2nd mouse/keyboard is not needed, other person should read and think, not type simultaneously 247: discrep*a*ncy 251: partic-ipate ? 254: digital media at&#9251;Columbia (space missing) 281: "Overlapping..." title => italic 287: people that take/s/ some drug -- people take, not the population 293: "Oral..." title => italic 304: (hers is shown ... *)* 341: line 44 is hard to read, code doesn't match other formatting 349: Map*R*educe 351ff: Index: "Amazon Mechanical Turk" in Amazon bunch together "causal ..." bunch together "chaos ..." "Protocol buffers" instead of "prtobuf" and probably some more.

Note from the Author or Editor:
32: x in seconds? Don't integrate over minutes Cathy: change the "measured in seconds" to "measured in minutes" in the above paragraph. 50: plot(log(), log()): see http://spacecraft.ssl.umd.edu/akins_laws.html, twice. "6. (Mar's Law) Everything is linear if plotted log-log with a fat magic marker." bk.homes[which ...] -- indentation of 3rd line wrong log() <= 5 ... better use <= 1e5 or 100e3 and remove log() Cathy: please indent after the "bk.homes" line as the above lines are indented. Otherwise fine to ignore these suggestions. 73: 3-7 doesn't include the points listed above Cathy: That's true. We might wanna change it to be more reasonable. I don't have the original data that was plotted here. 74: 3-8 use "x" for new guy, this point is already in 3-7 Cathy: Can erase the "?" point in 3-7 for clarity. 76: Hamming: shoe +s-s => hose, distance is 2 Cathy: this is false. Ignore. 77: n.points = length(data) Why not simply use a boolean vector of some length on data? Cathy: ignore 108: WWW::Mechanize, and generally Perl for text extraction Cathy: ignore. 111,112: script could use a few functions Cathy: please write your own book with a few functions. 141: 6-4 no counts visible Cathy: that's ok. 147: what does 6-7 show? Cathy: X-axis should be labeled "time in seconds" 246: 2nd mouse/keyboard is not needed, other person should read and think, not type simultaneously Cathy: ludicrous comment. Ignore. Also this is on page 245.

Rachel Schutt
O'Reilly Author 
Nov 20, 2013  Dec 13, 2013
PDF
Page multiple
multiple

Page Error Note p.207 star-up should be "start-up" p.359 want achieve should be "want to achieve" p.162-163 section headers are different sizes "Exercise: GetGlue and Timestamped Event Data" and "Exercise: Financial Data" should be same size font p.68 dgree In figure 3-6, should be "degree" p.32-33 inconsistent capitalization of random variables: x vs X p.21-22 indentation is odd and seems arbitrary index curse of dimensionality missing p.282 "That experimental infrastructure" strange phrasing

Rachel Schutt
O'Reilly Author 
Nov 20, 2013  Dec 13, 2013
PDF
Page xxi
2nd bullet

Introduction to Machine Learning (Adaptive Computation and Machine Learning) by Ethem Alpaydim (MIT Press) It is not Alpayd&#305;m, it's Alpayd&#305;n.

Tolga Bakkaloglu  Jan 01, 2014 
PDF
Page 95
2d paragraph 1st sentence

"Thinking back to the previous chapter, in order to use liner regression,..." should be 'linear'

donald f caldwell  Dec 01, 2013  Dec 13, 2013
PDF
Page 108
lines 17-21

On page 108 a part of paragraph is repeated (four lines from "Represent each image..." to "...between 0 and 255").

Zdzis&#322;aw P&#322;oski  May 06, 2014 
ePub
Page 119
United States

w.r.t. to my just submitted errata, it appears that its my github ignorance. Shift clicking on the file doesn't have the obvious semantics, but the button on the right side of the pane "download zipfile" does. So my request would be for a slight change to the text to make this clear for us cvs, sccs, svn, bitkeeper folks who didn't get with Git.

Note from the Author or Editor:
Github Readme adjusted to indicate Download Zip button.

Keith Bierman  Oct 31, 2013  Dec 03, 2013
PDF
Page 119
Line 14 from the top

There is: "Recall that in Chapter 3". There should be "Recall that in Chapter 4".

Zdzislaw Ploski  Jul 31, 2014 
PDF
Page 160
Line 10 from the bottom

There is: "we solve for beta to get". Should be: "we solve for <Greek letter 'beta'> to get" as in several other places before.

Zdzislaw Ploski  May 25, 2014 
PDF
Page 161
Line 10 from the bottom (inside formula)

There is: ")/". Should be: ")".

Zdzislaw Ploski  May 25, 2014 
PDF
Page 162
line 16 from the bottom

In the sentence: "Heres some R code to look at the first 10 rows in R" words "in R" are redundant.

Note from the Author or Editor:
change to "Here's some R code to look at the first 10 rows"

Zdzislaw Ploski  May 19, 2014 
PDF
Page 200
16g

There is: "They were questions". There should be, I think: "There were questions".

Zdzislaw Ploski  Jun 03, 2014 
PDF
Page 201
Lines 9-10 from the top

In the sentence: "there are lines from a user to an item if that user has expressed an opinion about that item" words "are lines" should be replaced by "is a line" (cp. Fig. 8-1).

Zdzislaw Ploski  Jun 04, 2014 
PDF
Page 205
17 from the top

There is: "the coefficients on one can be 100,000". There should (?) be: "the coefficient on one can be 100,000".

Zszislaw Ploski  Jun 03, 2014 
PDF
Page 222
Line 6 form the top

There is: "a cool example of how ideally, data science integrates". Should be: "a cool example of how ideally data science integrates".

Zdzislaw Ploski  Jun 14, 2014 
PDF
Page 229
Lines 5, 10 and 11 from the to

In line 5 there is "bit.ly". In lines 10 and 11: "bitly". Suggestion: use uniform notation everywhere.

Zdzislaw Ploski  Jun 14, 2014 
PDF
Page 245
line 8 from the bottom

There is: "using git. Learn about git". Better: "using Git. Learn about Git".

Zdzislaw Ploski  Jun 09, 2014 
PDF
Page 269
Lines 19-20 from the top

There is a typo error in surname "Kolazcyk". The proper surname sounds: Kolaczyk.

Zdzislaw Ploski  Jun 21, 2014 
PDF
Page 277
Line 6 from the bottom

There is : "say on". Should be: "say in" (cp. appropriate site)..

Zdzislaw Ploski  Jun 24, 2014 
PDF
Page 298
Lines 1-2 from the top

The sentence "The kinds of decisions they tweaked were of the following types" sounds not good due to these "kinds of the types". Perhaps "The kinds of decisions they tweaked were as follows" would be better.

Zdzislaw Ploski  Jun 27, 2014 
PDF
Page 301
Line 4 from the bottom

There is a word "medicare" (starting from a lower case "m"). Is it about Medicare (cp. www.medicare.gov)?

Zdzislaw Ploski  Jun 28, 2014 
PDF
Page 315
Lines 6-7 from the top

In the sentence "if the vast majority is of binary outcomes are 1" is the word "is" mandatory?

Note from the Author or Editor:
delete "is" from sentence

Zdzislaw Ploski  Jun 30, 2014 
PDF
Page 319
Lines 15-16 from the top

In the sentence: "Youd like to save money and only send money to people who are likely to give" the second word "money" should be replaced with "letter".

Note from the Author or Editor:
change sentence to "...and only send a letter to people..."

Zdzislaw Ploski  Jun 30, 2014 
PDF
Page 330
Lines 2-4 from the top

There is useless redundancy in the sentence: "a record with a person living in zip code 90210 who clicked on an ad would get emitted to (90210,{1,1}) if that person saw an ad and clicked, or (90210,{0,1}) if they saw an ad and didnt click.". Two times is written that a person clicked on an ad.

Note from the Author or Editor:
change sentence to: "You could run MapReduce keyed by zip code so that a record with a person living in zip code 90210 would get emitted to (90210,{1,1}) if that person saw an ad and clicked, or (90210,{0,1}) if they saw an ad and didnt click."

Zdzislaw Ploski  Jul 03, 2014 
PDF
Page 334
Lines16-15 from the bottom

Something is lack in the sentence: "Writing MapReduce in the Java API not pleasant". Lack of predicate?

Note from the Author or Editor:
"Writing MapReduce in the Java API is not pleasant."

Zdzislaw Ploski  Jul 05, 2014 
PDF
Page 335
Lines 12-11 from the bottom

There is: "Github". Should be: "GitHub".

Zdzislaw Ploski  Jul 04, 2014 
PDF
Page 341
Line 10 from the top

There is: "git". Should be: "Git".

Zdzislaw Ploski  Jul 07, 2014 
PDF
Page 344
Line 19 from the top

There is: "In addition". Should be: ". In addition".

Note from the Author or Editor:
add period between equation and "In addition" as indicated

Zdzislaw Ploski  Jul 07, 2014