Errata
The errata list is a list of errors and their corrections that were found after the product was released.
The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.
Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update
Version | Location | Description | Submitted by | Date submitted |
---|---|---|---|---|
Printed | Page page 82 middle |
|
huangqiyou | Nov 10, 2013 |
Printed | Page 6 1 |
Contrary to what is stated in the book, R does not appear to come pre-installed on Mac OS X. |
Anonymous | Mar 12, 2012 |
Page 11 table 1-2 |
The URLs for ggplot2 and glmnet are reversed. |
John Cook | Apr 14, 2012 | |
Mobi | Page 11 table 1-2 |
The location for the tm package is shown as http://www.spatstat.org/spatstat/, this however is the location for the SpatStat package. The location should be: http://tm.r-forge.r-project.org/index.html |
David Clark | Oct 28, 2012 |
Page 11 Table 1-2 |
Links in the location column for ggplot2 and glmnet packages (rows 2 and 3) are reversed.. |
Richard Smith | Dec 04, 2014 | |
Printed | Page 12 para 2 |
The package_installer.R script is not in "the code folder for this chapter". It is in the root directory of the example code. There is no "code" directory in chapter 1. |
Anonymous | Mar 14, 2012 |
Page 12-13 loading libraries and the data |
When going with this book it should be assumed that libraries and packages will run and install differently. |
Anonymous | Dec 06, 2012 | |
PDF, Mobi | Page 12 1st paragraph |
Some packages installed by the script require gfortran. However, they require gfortran 4.2.3. gfortran is currently upwards of version 4.9, and the command line options have changed. |
wackyvorlon | Jul 24, 2013 |
Page 14 last paragraph |
YYY should be YYYY |
John Cook | Apr 16, 2012 | |
Printed | Page 14 1st paragraph |
The data file ufo_awesome.tsv is way too big for quick processing. I'm trying out the statements in R while reading the book. The statement "ufo<-read.delim(...)" practically froze my Mac because of this large data file. Maybe provide a smaller file for quick programming along. |
Anonymous | May 14, 2012 |
PDF, | Page 15 Second to last paragraph |
Extraneous ">": |
Lorien Pratt | Apr 22, 2012 |
Printed | Page 15 Bottom Code sample |
The number of bad rows reads "371" instead of "731". Here is the printout when the code is run: |
Megan Squire | Sep 21, 2012 |
Page 15 2 |
from the data I download from github (https://github.com/johnmyleswhite/ML_for_Hackers/blob/master/01-Introduction/data/ufo/ufo_awesome.tsv), I think it's not propriate to use string length to filter out malformed data. Because I found "19940000" in DateOccurred and it will be transformed to "NA" by using "ufo$DateOccurred<-as.Date(ufo$DateOccurred, format="%Y%m%d")" after converting date strings |
kaiwang | Apr 03, 2016 | |
Page 16 second half (function and explanation) |
The strsplit function doesn't throw an error when the split character isn't matched -- it just returns the string, so the [[1]] reference apparently will always return something. |
Andrew Broman | Feb 16, 2012 | |
Page 16 middle |
strsplit function dosen't give an error when the split charater isn't matched. So the 'get.location' function should be changed like below. |
Jeong-ho park | Jan 03, 2014 | |
Page 18 2nd Paragraph |
Text and example diverge. Text states "We then use the is.na function to find which entries are not US states and reset them to NA in the USState column." In the example above the paragraph however the USState values are set to NA using a reverse lookup of the state name in us.states (due to the return of NA by match I presume). The USCity on the other hand uses the is.na function in the example. |
Lukas | May 21, 2012 | |
Page 18 example code |
seems like the assignment to ufo$USState should be |
Brian Drye | Jul 04, 2014 | |
19 Second Code snip-it |
Creating the histogram using the code given does not work. |
Anonymous | Mar 17, 2012 | |
Printed, PDF | Page 19 2 |
The examples to plot the data using ggplot2 refers to an outdated version of ggplot2. |
Anonymous | Mar 24, 2012 |
Page 19 Second block of code |
I think you have a ggplot version issue. When trying to generate the histogram for UFO sightings, I get the following error: |
Trey Causey | Mar 25, 2012 | |
PDF, Mobi | Page 19 code after second paragraph |
Line on book, which doesn't work properly, is |
Paulo Nuin | Oct 22, 2013 |
Page 24 3rd paragraph from the bottom |
"To check this, run the first line of code from the preceding block,?" |
Anonymous | Feb 18, 2012 | |
Page 25 last paragraph |
Montana is listed as having a spike around mid-1997, but I believe you mean Missouri. (Missouri's abbreviation is MO, while Montana's is MT.) |
Anonymous | Feb 18, 2012 | |
Page 29 end of first paragraph |
Is "gization" supposed to be "visualization"? |
Anonymous | Feb 18, 2012 | |
Page 32 Figure 2-2 |
"MxN" at bottom of vector should read "Mx1", as written above the graph. |
John Sandall | Apr 25, 2012 | |
37 first paragraph |
"by liberally by" should be "by liberal" |
Lorien Pratt | Apr 24, 2012 | |
41 First paragraph under "Standard Deviations and Variances" |
"center of list" should be "center of a list" |
Lorien Pratt | Apr 25, 2012 | |
Printed | Page 45 last paragraph |
bindwidths |
Martin Schader | Feb 02, 2013 |
Printed | Page 45 First Sentence of Last Paragraph |
"Because setting bindwidths" has erroneously included an extra letter "d" and should state, "Because setting binwidths". |
Joe Nolan | Mar 27, 2016 |
Page 49 bottom part - in the text |
The example you write about is using the weight |
Marco Pashkov | Jul 11, 2012 | |
Printed | Page 49 last paragraph |
Here you discuss Fig. 2-11 and the weights of women and men. |
Martin Schader | Feb 02, 2013 |
54 Last paragraph |
"in word." should be "in words." |
Lorien Pratt | Apr 25, 2012 | |
ePub | Page 54 code |
For the scale_x_date function, the code in the book uses "major" as the parameter name, when it should be "breaks" e.g. scale_x_date(breaks = "5 years", ... |
Roy C | Feb 05, 2014 |
Printed | Page 71 United States |
When I create the plot on page 71 with the code on page 70, the "Height" and "Weight" axes are switched. |
Dan Williams | Apr 16, 2012 |
73 First two paragraphs |
The last line of the first paragraph references "Example 3-1", and the following paragraph refers to this example as containing black lines, blue dots, and red dots. Example 3-1 does not have these elements, rather it is an example of a candidate for spam email. Also, I cannot find a figure with black lines and blue and red dots. I believe that Example 3-1 should be changed to "Figure 3-1" and the text should be updated to refer to the horizontal dashed lines, and black triangles, rectangles, and circles. |
Lorien Pratt | Apr 25, 2012 | |
Printed | Page 73 second paragraph |
You discuss blue dots and red dots in Fig. 3-1. |
Martin Schader | Feb 02, 2013 |
Printed | Page 74 third paragraph |
the code/data folder... |
Martin Schader | Feb 02, 2013 |
75 Last paragraph and figure above it |
The last paragraph refers to Figure 3-1, but I think this is incorrect. If this is a correct figure reference, then it is not clear what "X", and "Y" refer to here, nor what triangles and circles refer to (spam or ham?), and these should be clarified. If this is not the intended figure, then I think that the correct reference is Figure 3-2. |
Lorien Pratt | Apr 25, 2012 | |
Printed | Page 75 Figure 3-1 and last paragraph |
It is not clear what Fig. 3-1 displays. |
Martin Schader | Feb 03, 2013 |
76 first paragraph |
This paragraph references Figure 3-2 as containing jittered data, but it does not. I do not believe that a jittered data picture is present. |
Lorien Pratt | Apr 25, 2012 | |
76 Last paragraph |
The code to generate the picture that is distributed in the book should be updated to correspond to the latest version of ggplot. Specifically, in the file email_classify.R, |
Lorien Pratt | Apr 26, 2012 | |
Printed | Page 76 Last paragraph |
The code to generate Figure 3-2 needs to be updated to correspond to the latest version of ggplot. In the file email_classify.R, the plot command |
Anonymous | Jun 22, 2012 |
Printed | Page 76 1st paragraph |
Instead of Figure 3-2 |
Martin Schader | Feb 03, 2013 |
Printed | Page 80 para 1 |
Two errors. |
Anonymous | Mar 20, 2012 |
80 Middle of page |
This function generates the indicated error message: |
Lorien Pratt | Apr 26, 2012 | |
Page 80 United States |
This line consistently throws error: |
Kingshuk Chatterjee | Oct 31, 2012 | |
Printed | Page 80 ff - |
Some of the files in the spam folder you provided (no. 263, 320, 323, and 324) contain characters like \202, \203, etc. |
Martin Schader | Feb 02, 2013 |
Printed | Page 81 1st paragraph |
When I run the following code: |
Paul Reiners | Feb 24, 2012 |
81 secdond paragraph |
all.spam <- sapply(spam.docs, function(p) get.msg(paste(spam.path,p,sep=""))) |
Lorien Pratt | Apr 26, 2012 | |
Printed | Page 82 top of page |
To create the TDM, the options stopwords=TRUE and minDocFreq=2 are used. But the resulting TDM includes stopwords and terms with frequency of 1. The options for removePunctuation and removeNumbers appear to work properly. |
Anonymous | Dec 25, 2012 |
Printed | Page 82 Top of page, 3rd and 4th line of code |
The stopwords are still showing up in spam.df. Someone else also posted this in December. Any news? |
Dave Gilsdorf | Mar 21, 2013 |
Printed | Page 82 get.tdm |
In recent versions of the tm package minDocFreq=2 has been replaced by bounds = list(global = c(2,Inf)). See https://stackoverflow.com/questions/16287546/trying-to-remove-words-from-a-documenttermmatrix-in-order-to-use-topicmodels |
ifernando | Feb 21, 2018 |
Printed | Page 83 4th par |
When I user the data you provide on this website and compute spam.df, the result is |
Martin Schader | Feb 03, 2013 |
Printed | Page 84 4th par |
When I compute easyham.df with the data you provided, the result is |
Martin Schader | Feb 03, 2013 |
87 First paragraph of text |
"grey shaded area of Figure 3-3" should read "dark blue (center) shaded area of Figure 3-4". |
Lorien Pratt | Apr 28, 2012 | |
87 First paragraph of text |
"as depicted in Figure 3-3." should read "as depicted in Figure 3-4." |
Lorien Pratt | Apr 28, 2012 | |
Printed | Page 87 2nd par and 4th par |
par 2: |
Martin Schader | Feb 03, 2013 |
Printed | Page 87 First (only) code block |
The constant c is exponentiation (^) two times, when it should be multiplied (*). The narrative below (the 3rd and 4th paragraphs) indicate that a product is being obtained and multiplication seems to be more logical than exponentiation. |
Anonymous | Feb 28, 2013 |
Page 87 Code |
In classify.email R rounds to zero the product of probabilities of a long term lists. |
ifernando | Feb 26, 2018 | |
88 First code block |
replace sep="" in two places in this code with sep="/" |
Lorien Pratt | Apr 28, 2012 | |
Printed | Page 88 first code snippet |
hardham.res <- ifelse(hardham.spamtest > hardham.hamtest, TRUE, FALSE) |
Martin Schader | Feb 03, 2013 |
100 footnote |
"are not acting" should be "are not more likely to act" |
Lorien Pratt | Apr 30, 2012 | |
Page 105 code |
date <- msg.vec[date.grep[1]] |
Anonymous | Mar 01, 2012 | |
105 code block at bottom of page |
easyham.parse <- lapply(easyham.docs, function(p) parse.email(paste(easyham.path, p, sep=""))) |
Lorien Pratt | May 04, 2012 | |
Page 105 First code sample - get.date |
The get.date function fails because the second line of the function says (note the 'l' after 'grep'): |
Maymount | Jun 25, 2013 | |
Mobi | Page 106 3rd paragraph |
When defining the parameters for the strptime function, it would be helpful to point out that these return information like abbreviated Weekdays or Months in the current locale of the machine. This means if you are not a native speaker of English and you have configured your machine to talk in your native language, you are running the risk of getting lots of NAs when running that function on English emails. |
Thomas Prosser | Dec 17, 2013 |
108 First code block |
As with the error reported for Page 80, the encoding should be "native.enc", not "latin1", otherwise this generates an error message. |
Lorien Pratt | May 04, 2012 | |
Page 108 2nd paragraph |
>from.weight <- ddply(priority.train, .(From.EMail),summarise, Freq = length(Subject)) |
David | Mar 21, 2013 | |
Printed | Page 114 1st code block |
R 3.2.5 user here. In the first line of the `thread.counts` function, the call to the `paste` function uses the default argument `sep=" "` because the `sep` argument is not supplied, so an unwanted space is introduced between the string "re: " and the subject line during comparison. The result is that most threads will not be found. |
Anonymous | Apr 18, 2016 |
Printed | Page 114 1st code block |
Sorry, I submitted the errata above but missed out a space in the corrected line of code above for the "re: " string. Correct code should be: |
Anonymous | Apr 18, 2016 |
Printed | Page 141 Code block half way down |
R^2 is calculated as 1 - (model.rmse / mean.rmse), but these values should be MSEs rather than RMSEs. |
Phil Hazelden | Dec 12, 2012 |
ePub | Page 142 First paragraph of Chapter 3 |
The first paragraph of Chapter 3 refers to Example 3-1 as a dataset on health and ailments, but Example 3-1 is an email header for spam classification. The second paragraph also mentions blue and red dots, but there are no blue or red dots. Figure/Example 3-1 references are mismatched/missing. |
Roy C | Feb 05, 2014 |
Printed | Page 150 2nd paragraph |
The code in the 2nd sentence of the 2nd paragraph "sqrt(mean(residuals(lm.fit) ^ 2))" should be replaced by "sqrt((sum(residuals(lm.fit) ^ 2)) / 998)". The Residual Standard Error does not strictly use the mean of the squared residuals but rather the sum of the squared residuals divided by n - p (in this case 998), where p is the number of predictors in your model including intercept. |
Clay Ford | Nov 24, 2012 |
Printed | Page 152 1st code block |
For `summary(lm.fit)$r.squared` done on `lm.fit <- lm(log(PageViews) ~ InEnglish, data=top.1000.sites)`, I got 0.3043425 instead of 0.03122206 |
Anonymous | Apr 21, 2016 |
156 figure at top of the page |
figure should include labels (a), (b), (c), and (d) to match caption and text. |
Lorien Pratt | May 05, 2012 | |
Mobi | Page 169 code block |
This function: |
rocjoe | Sep 15, 2017 |
Printed | Page 170 Last code block |
I'm using R 3.2.5 with glmnet 2.0-5 |
Anonymous | Apr 24, 2016 |
Printed | Page 175 1st codeblock |
I'm using R 3.2.5 with tm 0.6-2 |
Anonymous | Apr 25, 2016 |
Printed | Page 183 Last paragraph and top of following page |
"In this example, the a parameter is the slope of the line and the b parameter is the intercept" disagrees with the preceding code snippet and following paragraphs. "a" and "b" need swapping for it to be correct. |
Jonathan Hammler | Sep 14, 2012 |
186 first paragraph |
"another a second" should be "a second" |
Lorien Pratt | May 07, 2012 | |
Printed | Page 200 Second paragraph |
"That value turns out to be not to be numerically unstable." should read "That value turns out not to be numerically unstable" or "That value turns out to be numerically stable." |
Jonathan Hammler | Sep 14, 2012 |
Printed | Page 207 . |
While reading ch.8, page 207, I wondered why the percentages of variance added up to more than 100%. |
Anonymous | Jul 26, 2012 |
Printed | Page 207 2nd code block |
R 3.2.5 user here. |
Anonymous | Apr 27, 2016 |
212 Last code section |
First two lines of code should read: |
Lorien Pratt | May 07, 2012 | |
216 Last paragraph |
"products 2 and 3" should read "products 2 and 4" |
Lorien Pratt | May 07, 2012 | |
218 bottom of figure |
The table at the bottom of figure 9-1 should have rows titled A, B, C, D, not P1, P2, P3, P4 |
Lorien Pratt | May 07, 2012 | |
Printed | Page 219 p. 219 ff. |
What's the reason for invoking dist() on ex.mult and not on ex.matrix? |
Martin Schader | Feb 21, 2013 |
Other Digital Version | 223 3rd line |
prices <- transform(prices, Date = ymd(Date) |
Anonymous | Jun 19, 2013 |
224 Code at top of page |
sep="" in the second line on the page should be sep="/" (at least on the Windows 7 machine on which I am testing) |
Lorien Pratt | May 11, 2012 | |
224 Final text paragraph |
"column are" should read "column names are" |
Lorien Pratt | May 11, 2012 | |
Printed | Page 242 p. 242 source code |
Interesting that you recommend ten packages that the user (no. 1) has already installed. |
Martin Schader | Mar 02, 2013 |
Printed | Page 250 Google SocialGraph API box |
"supplemental files of the book that were generated by this code before the SocialGraph API occurred." should be "supplemental files of the book that were generated by this code before the change to SocialGraph API occurred." |
Jonathan Hammler | Sep 14, 2012 |
Printed | Page 252 First paragraph (after code) |
URLs should be split by a slash, not a backslash as stated. The code listing is correct, but the text is not. |
Jonathan Hammler | Sep 14, 2012 |
Printed | Page 258 Bottom of page |
Closer nodes are described as having "less hops between them". They should, of course, have "fewer hops". |
Anonymous | Sep 14, 2012 |
Printed | Page 280 Graph |
The authors apparently thought they were developing graphs for a colored media. The printed books, however, are black and white. |
Anonymous | Mar 20, 2012 |