Doing Data Science

Errata for Doing Data Science

Submit your own errata for this product.


The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color Key: Serious Technical Mistake Minor Technical Mistake Language or formatting error Typo Question Note Update



Version Location Description Submitted By Date Submitted Date Corrected
Printed
Page multiple
multiple

This errata was submitted by Philipp Marek via email. Errata for Doing data science I mark /deletions/, and *changes*. This is in UTF8 -- so eg. a CRLF is shown as down-left pointing arrow: &#8629; xvii: move 3 words: there is more breath // than depth *in some cases* xxi: Forgot to mention "Visual Display of Quantitative Information" ... although listed on p37 2: statis/i/tican 14: 1-4 use different shades of gray, or dashes or something like that 30: observed real-world phenomen*a* (or *a* phenomenon) 32: x in seconds? Don't integrate over minutes 38: http://stat.columbia.edu - everything else on github 43: hypo-thesis, not th-esis (?) 2-3 Huma*n* behavi*or* (nouns) Trying to read associations fails; put Olympics beneath Olympic records? 44: an extension /of/ or variation of 48: an-swered? 49: Did Doug use ... (... "CPC") -- aren't used in text, no need to explain 50: plot(log(), log()): see http://spacecraft.ssl.umd.edu/akins_laws.html, twice. "6. (Mar's Law) Everything is linear if plotted log-log with a fat magic marker." bk.homes[which ...] -- indentation of 3rd line wrong log() <= 5 ... better use <= 1e5 or 100e3 and remove log() 68: 3-6 truth = d*e*gree 2 (top right) 69: x*_2* * x_3 71: x&#8321;�, not x�&#8321; 72: you'd have establish*ed* the bins (or have *to* establish) 73: 3-7 doesn't include the points listed above 74: 3-8 use "x" for new guy, this point is already in 3-7 76: Hamming: shoe +s-s => hose, distance is 2 we start with a Google search ... *which to use*. 77: n.points = length(data) Why not simply use a boolean vector of some length on data? swap lines: train <- and #define 78: swap cl <- and # swap true.labels and # 79: # We're using ... comment not helpful 85: http://abt.cm -- why a different link shortener? we showed how *to* explore and clean 87: remove line setwd() 90: U of Edinb*o*rough? 101: parallel/-/ly 108: WWW::Mechanize, and generally Perl for text extraction 111,112: script could use a few functions 117: *An Empirical...* format different from other book references or titles 129: "non discrete)" is still a comment, wrong format used c[, 2] - space before "," missing 131: vlist <- use less space to avoid line break, twice 132: "use holdout group" join to previous line "vars" within for loop? 137: prop/o/agates 140: 6-3 no colors visible. use distinguisable grays? 141: 6-4 no counts visible 147: what does 6-7 show? 151: 6-8 label both axes with text 155: 6-12 factors not distinguishable 156: this_E is unused 176: "Director of Research..." in one line 177: the modeling part isn't *what* we want 183: AIC Info*r*mation 184: a college studen ... spend *her* time 191: "column which is our response" is still a comment, has wrong format 194: "Google's Hybrid Approach" title => italic 201: simple but comp*l*ete 215: vr = indentation wrong 236: to a*c*cept 241: the Predicted=False row should have FN, TN 246: 2nd mouse/keyboard is not needed, other person should read and think, not type simultaneously 247: discrep*a*ncy 251: partic-ipate ? 254: digital media at&#9251;Columbia (space missing) 281: "Overlapping..." title => italic 287: people that take/s/ some drug -- people take, not the population 293: "Oral..." title => italic 304: (hers is shown ... *)* 341: line 44 is hard to read, code doesn't match other formatting 349: Map*R*educe 351ff: Index: "Amazon Mechanical Turk" in Amazon bunch together "causal ..." bunch together "chaos ..." "Protocol buffers" instead of "prtobuf" and probably some more.

Note from the Author or Editor:
32: x in seconds? Don't integrate over minutes Cathy: change the "measured in seconds" to "measured in minutes" in the above paragraph. 50: plot(log(), log()): see http://spacecraft.ssl.umd.edu/akins_laws.html, twice. "6. (Mar's Law) Everything is linear if plotted log-log with a fat magic marker." bk.homes[which ...] -- indentation of 3rd line wrong log() <= 5 ... better use <= 1e5 or 100e3 and remove log() Cathy: please indent after the "bk.homes" line as the above lines are indented. Otherwise fine to ignore these suggestions. 73: 3-7 doesn't include the points listed above Cathy: That's true. We might wanna change it to be more reasonable. I don't have the original data that was plotted here. 74: 3-8 use "x" for new guy, this point is already in 3-7 Cathy: Can erase the "?" point in 3-7 for clarity. 76: Hamming: shoe +s-s => hose, distance is 2 Cathy: this is false. Ignore. 77: n.points = length(data) Why not simply use a boolean vector of some length on data? Cathy: ignore 108: WWW::Mechanize, and generally Perl for text extraction Cathy: ignore. 111,112: script could use a few functions Cathy: please write your own book with a few functions. 141: 6-4 no counts visible Cathy: that's ok. 147: what does 6-7 show? Cathy: X-axis should be labeled "time in seconds" 246: 2nd mouse/keyboard is not needed, other person should read and think, not type simultaneously Cathy: ludicrous comment. Ignore. Also this is on page 245.

Rachel Schutt
O'Reilly Author 
Nov 20, 2013  Dec 13, 2013
PDF
Page multiple
multiple

Page Error Note p.207 star-up should be "start-up" p.359 want achieve should be "want to achieve" p.162-163 section headers are different sizes "Exercise: GetGlue and Timestamped Event Data" and "Exercise: Financial Data" should be same size font p.68 dgree In figure 3-6, should be "degree" p.32-33 inconsistent capitalization of random variables: x vs X p.21-22 indentation is odd and seems arbitrary index curse of dimensionality missing p.282 "That experimental infrastructure" strange phrasing

Rachel Schutt
O'Reilly Author 
Nov 20, 2013  Dec 13, 2013
PDF
Page 95
2d paragraph 1st sentence

"Thinking back to the previous chapter, in order to use liner regression,..." should be 'linear'

donald f caldwell  Dec 01, 2013  Dec 13, 2013
ePub
Page 119
United States

w.r.t. to my just submitted errata, it appears that its my github ignorance. Shift clicking on the file doesn't have the obvious semantics, but the button on the right side of the pane "download zipfile" does. So my request would be for a slight change to the text to make this clear for us cvs, sccs, svn, bitkeeper folks who didn't get with Git.

Note from the Author or Editor:
Github Readme adjusted to indicate Download Zip button.

Keith Bierman  Oct 31, 2013  Dec 03, 2013