Errata

Errata for Machine Learning for Hackers

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
Printed	Page page 82 middle	TermDocumentMatrix(doc.corpus,control) Error in tolower(txt) : invalid multibyte string 1	huangqiyou	Nov 10, 2013
Printed	Page 6 1	Contrary to what is stated in the book, R does not appear to come pre-installed on Mac OS X. At least, I can't find it, and the instructions as given do not work. (i.e. typing R in terminal gives 'command not found') Perhaps the author had already installed R? Checking a dozen or so of the top sites for "r mac os x" shows they all refer to installing R. I didn't find any references to R being preinstalled.	Anonymous	Mar 12, 2012
PDF	Page 11 table 1-2	The URLs for ggplot2 and glmnet are reversed.	John Cook	Apr 14, 2012
Mobi	Page 11 table 1-2	The location for the tm package is shown as http://www.spatstat.org/spatstat/, this however is the location for the SpatStat package. The location should be: http://tm.r-forge.r-project.org/index.html	David Clark	Oct 28, 2012
PDF	Page 11 Table 1-2	Links in the location column for ggplot2 and glmnet packages (rows 2 and 3) are reversed..	Richard Smith	Dec 04, 2014
Printed	Page 12 para 2	The package_installer.R script is not in "the code folder for this chapter". It is in the root directory of the example code. There is no "code" directory in chapter 1. Also, there does not appear to be a link to the example code in the book. It's easily found on the website, but it should be in the book. (and it should be in the index of the book). And the script has no error checking, so even if it fails on the first package, it keeps running, and failing, all the way through. And it fails as it requires the mac developer tools to run (for make, etc). These are mentioned as optional on page 7. They are apparently required. In addition, page 7 also says "requires both the C and Fortran compilers... you can install these compilers using the mac os x developers tools DVD". I don't have the DVD, but the downloaded version should be identical, and does not contain Fortran. This must be installed separately. (http://cran.r-project.org/bin/macosx/tools/). This is fairly severe, as the install will continue to the end, but the error message scroll past, unless you are watching, leaving you with a broken install. This seems likely to confuse new R users And still some errors. Will report them later, after the build finishes. Most or all of this could be avoided if the install script used binary packages, not source.	Anonymous	Mar 14, 2012
PDF	Page 12-13 loading libraries and the data	When going with this book it should be assumed that libraries and packages will run and install differently. My error was found in frustration. Maybe when this book was published ggplot2 loaded two other required packages: plyr and reshape. ggplot2 now uses a NAMESPACE, and only exports functions that should be user visible - this should make it play considerably more nicely with other packages in the R ecosystem. from version 0.9.0, the implementation was changed to avoid possible conflicts when multiple packages were loaded. ggplot2	Anonymous	Dec 06, 2012
PDF, Mobi	Page 12 1st paragraph	Some packages installed by the script require gfortran. However, they require gfortran 4.2.3. gfortran is currently upwards of version 4.9, and the command line options have changed. Where the command line used by the packages says: -arch x86_64, it should say: -march=native This allows gfortran to select the best architecture for the machine it's running on. Without 4.2.3, these packages will fail to build. This point should be clarified in the book. It took me some digging to figure out what was wrong, and someone with less experience will have a very hard time making it work. I am using a Macbook Pro, running the latest MacOS.	wackyvorlon	Jul 24, 2013
PDF	Page 14 last paragraph	YYY should be YYYY	John Cook	Apr 16, 2012
Printed	Page 14 1st paragraph	The data file ufo_awesome.tsv is way too big for quick processing. I'm trying out the statements in R while reading the book. The statement "ufo<-read.delim(...)" practically froze my Mac because of this large data file. Maybe provide a smaller file for quick programming along.	Anonymous	May 14, 2012
PDF,	Page 15 Second to last paragraph	Extraneous ">": "good.rows<-ifelse(nchar(ufo$DateOccurred)>!=8 \| nchar(ufo$DateReported)!=8,FALSE, TRUE)" should be: "good.rows<-ifelse(nchar(ufo$DateOccurred)!=8 \| nchar(ufo$DateReported)!=8,FALSE, TRUE)"	Lorien Pratt	Apr 22, 2012
Printed	Page 15 Bottom Code sample	The number of bad rows reads "371" instead of "731". Here is the printout when the code is run: > good.rows<-ifelse(nchar(ufo$DateOccurred)!=8 \| nchar(ufo$DateReported)!=8,FALSE,TRUE) > length(which(!good.rows)) [1] 731	Megan Squire	Sep 21, 2012
PDF	Page 15 2	from the data I download from github (https://github.com/johnmyleswhite/ML_for_Hackers/blob/master/01-Introduction/data/ufo/ufo_awesome.tsv), I think it's not propriate to use string length to filter out malformed data. Because I found "19940000" in DateOccurred and it will be transformed to "NA" by using "ufo$DateOccurred<-as.Date(ufo$DateOccurred, format="%Y%m%d")" after converting date strings	kaiwang	Apr 03, 2016
PDF	Page 16 second half (function and explanation)	The strsplit function doesn't throw an error when the split character isn't matched -- it just returns the string, so the [[1]] reference apparently will always return something. Perhaps the solution here is to reference [[2]] rather than [[1]], to check if a split occurred?	Andrew Broman	Feb 16, 2012
PDF	Page 16 middle	strsplit function dosen't give an error when the split charater isn't matched. So the 'get.location' function should be changed like below. get.location <- function(l) { split.location <- tryCatch(strsplit(l, ",")[[1]], error = function(e) return(c(NA, NA))) clean.location <- gsub("^ ","",split.location) if (length(clean.location) > 2\|length(clean.location)==1) { return(c(NA,NA)) } else { return(clean.location) } } By the way, thank you for this great book.	Jeong-ho park	Jan 03, 2014
PDF	Page 18 2nd Paragraph	Text and example diverge. Text states "We then use the is.na function to find which entries are not US states and reset them to NA in the USState column." In the example above the paragraph however the USState values are set to NA using a reverse lookup of the state name in us.states (due to the return of NA by match I presume). The USCity on the other hand uses the is.na function in the example.	Lukas	May 21, 2012
PDF	Page 18 example code	seems like the assignment to ufo$USState should be ufo$USState <- ufo$USState[...]	Brian Drye	Jul 04, 2014
	19 Second Code snip-it	Creating the histogram using the code given does not work. Running: quick.hist<-ggplot(ufo.us, aes(x=DateOccurred))+geom_histogram()+ scale_x_date(major="50 years") Generates an error. > quick.hist<-ggplot(ufo.us, aes(x=DateOccurred))+geom_histogram()+ + scale_x_date(major="50 years") Error in continuous_scale(aesthetics, "date", identity, breaks = breaks, : unused argument(s) (major = "50 years") This error occurs when running the code from the book as well as the download code samples.	Anonymous	Mar 17, 2012
Printed, PDF	Page 19 2	The examples to plot the data using ggplot2 refers to an outdated version of ggplot2. When using the code in the book with the new version of ggplot2, the following errors are prduced: ?Error in continuous_scale? and ?error in inherits?	Anonymous	Mar 24, 2012
PDF	Page 19 Second block of code	I think you have a ggplot version issue. When trying to generate the histogram for UFO sightings, I get the following error: "Error in continuous_scale(aesthetics, "date", identity, breaks = breaks, : unused argument(s) (major = "50 years") I am using R 2.14.2 and ggplot 0.9.0. It would appear someone on Stack Exchange is having the same error: http://stackoverflow.com/questions/9857123/error-in-continuous-scale-and-error-in-inherits-ggplot2-r-2-14-2	Trey Causey	Mar 25, 2012
PDF, Mobi	Page 19 code after second paragraph	Line on book, which doesn't work properly, is quick.hist <- ggplot(ufo.us, aes(x=DateOccured)+geom_histogram)+scale_x_date(major="50 years") while in the code provided by the authors major="50years" is replaces by breaks="50 years"	Paulo Nuin	Oct 22, 2013
PDF	Page 24 3rd paragraph from the bottom	"To check this, run the first line of code from the preceding block,?" This should have "first two lines of code" instead. Otherwise, it wouldn't include the "geom_line" call.	Anonymous	Feb 18, 2012
PDF	Page 25 last paragraph	Montana is listed as having a spike around mid-1997, but I believe you mean Missouri. (Missouri's abbreviation is MO, while Montana's is MT.)	Anonymous	Feb 18, 2012
PDF	Page 29 end of first paragraph	Is "gization" supposed to be "visualization"?	Anonymous	Feb 18, 2012
PDF	Page 32 Figure 2-2	"MxN" at bottom of vector should read "Mx1", as written above the graph.	John Sandall	Apr 25, 2012
	37 first paragraph	"by liberally by" should be "by liberal"	Lorien Pratt	Apr 24, 2012
	41 First paragraph under "Standard Deviations and Variances"	"center of list" should be "center of a list"	Lorien Pratt	Apr 25, 2012
Printed	Page 45 last paragraph	bindwidths should be replaced by binwidths	Martin Schader	Feb 02, 2013
Printed	Page 45 First Sentence of Last Paragraph	"Because setting bindwidths" has erroneously included an extra letter "d" and should state, "Because setting binwidths".	Joe Nolan	Mar 27, 2016
PDF	Page 49 bottom part - in the text	The example you write about is using the weight ggplot(heights.weights, aes(x = Weight, fill = Gender)) + geom_density() + facet_grid(Gender ~ .) but in the text you are writing about the height: "Once we?ve done this, we clearly see one bell curve centered at 64? for women and another bell curve centered at 69? for men." You should change the code to: ggplot(heights.weights, aes(x = Height, fill = Gender)) + geom_density() + facet_grid(Gender ~ .)	Marco Pashkov	Jul 11, 2012
Printed	Page 49 last paragraph	Here you discuss Fig. 2-11 and the weights of women and men. Therefore, instead of the curve centers 64" and 69" (inches), the means of the weights in pounds should be given.	Martin Schader	Feb 02, 2013
	54 Last paragraph	"in word." should be "in words."	Lorien Pratt	Apr 25, 2012
ePub	Page 54 code	For the scale_x_date function, the code in the book uses "major" as the parameter name, when it should be "breaks" e.g. scale_x_date(breaks = "5 years", ... The same applies for the other uses of scale_x_date in this chapter. The example code is accurate.	Roy C	Feb 05, 2014
Printed	Page 71 United States	When I create the plot on page 71 with the code on page 70, the "Height" and "Weight" axes are switched.	Dan Williams	Apr 16, 2012
	73 First two paragraphs	The last line of the first paragraph references "Example 3-1", and the following paragraph refers to this example as containing black lines, blue dots, and red dots. Example 3-1 does not have these elements, rather it is an example of a candidate for spam email. Also, I cannot find a figure with black lines and blue and red dots. I believe that Example 3-1 should be changed to "Figure 3-1" and the text should be updated to refer to the horizontal dashed lines, and black triangles, rectangles, and circles.	Lorien Pratt	Apr 25, 2012
Printed	Page 73 second paragraph	You discuss blue dots and red dots in Fig. 3-1. This figure is monochrome and displays circles and triangles.	Martin Schader	Feb 02, 2013
Printed	Page 74 third paragraph	the code/data folder... should be replaced by the data folder...	Martin Schader	Feb 02, 2013
	75 Last paragraph and figure above it	The last paragraph refers to Figure 3-1, but I think this is incorrect. If this is a correct figure reference, then it is not clear what "X", and "Y" refer to here, nor what triangles and circles refer to (spam or ham?), and these should be clarified. If this is not the intended figure, then I think that the correct reference is Figure 3-2.	Lorien Pratt	Apr 25, 2012
Printed	Page 75 Figure 3-1 and last paragraph	It is not clear what Fig. 3-1 displays. What is the x axis, what the y axis? In the text, when you say Figure 3-1, you might mean Figure 3-2.	Martin Schader	Feb 03, 2013
	76 first paragraph	This paragraph references Figure 3-2 as containing jittered data, but it does not. I do not believe that a jittered data picture is present.	Lorien Pratt	Apr 25, 2012
	76 Last paragraph	The code to generate the picture that is distributed in the book should be updated to correspond to the latest version of ggplot. Specifically, in the file email_classify.R, ex1 <- ggplot(val, aes(x, V2)) + geom_jitter(aes(shape = as.factor(V3)), position = position_jitter(height = 2)) + scale_shape_discrete(legend = FALSE, solid = FALSE) + geom_hline(aes(yintercept = c(10,30), linetype = 2)) + theme_bw() + xlab("X") + ylab("Y") should be: ex1 <- ggplot(val, aes(x, V2)) + geom_jitter(aes(shape = as.factor(V3)), position = position_jitter(height = 2)) + scale_shape_discrete(guide = "none", solid = FALSE) + geom_hline(aes(yintercept = c(10,30) )) + theme_bw() + xlab("X") + ylab("Y") This repairs two errors: "legend" is deprecated, and a new error: "A continuous variable can not be mapped to linetype" that is generated by "linetype = 2". Note that removing linetype = 2 produces solid, not dashed lines. This is consistent with the book text, but no longer matches the corresponding figure in the book.	Lorien Pratt	Apr 26, 2012
Printed	Page 76 Last paragraph	The code to generate Figure 3-2 needs to be updated to correspond to the latest version of ggplot. In the file email_classify.R, the plot command ex1 <- ggplot(val, aes(x, V2)) + geom_jitter(aes(shape = as.factor(V3)), position = position_jitter(height = 2)) + scale_shape_discrete(legend = FALSE, solid = FALSE) + geom_hline(aes(yintercept = c(10,30), linetype = 2)) + theme_bw() + xlab("X") + ylab("Y") produces one error, which prevents it from producing a graph, and one warning. Warning message: In discrete_scale("shape", "shape_d", shape_pal(solid), ...) : "legend" argument in scale_XXX is deprecated. Use guide="none" for suppress the guide display. Error: A continuous variable can not be mapped to linetype The version below fixes both the error by moving the "linetype = 2" out of the aes() function (keeping the dashed lines that are removed in the alternative solution suggested by Lorien Pratt) and fixes the warning by using guide="none" instead of the deprecated legend=FALSE argument. ex1 <- ggplot(val, aes(x, V2)) + geom_jitter(aes(shape = as.factor(V3)), position = position_jitter(height = 2)) + scale_shape_discrete(guide = "none", solid = FALSE) + geom_hline(aes(yintercept = c(10,30)), linetype = 2) + theme_bw() + xlab("X") + ylab("Y")	Anonymous	Jun 22, 2012
Printed	Page 76 1st paragraph	Instead of Figure 3-2 you mean Figure 3-3	Martin Schader	Feb 03, 2013
Printed	Page 80 para 1	Two errors. The url for rfc 822 is https://tools.ietf.org/id/rfc822 The book transposes the f and r But RFC 822 was replaced in 2011, and it's replacement again updated in 2008. The correct URL for this is https://tools.ietf.org/html/rfc5322	Anonymous	Mar 20, 2012
	80 Middle of page	This function generates the indicated error message: all.spam <- sapply(spam.docs, function(p) get.msg(paste(spam.path,p,sep="/"))) Error in seq.default(which(text == "")[1] + 1, length(text), 1) : wrong sign in 'by' argument it also generates warning messages on several spam files, the first such file being: "data/spam/00006.5ab5620d3d7c6c0db76234556a16f6c1". The error comes from the line: ?We've received 8,000 in 1 day and we are doing and is generated because of the first character, which appears to be outside the usual ascii range. The culprit is in the following function: get.msg <- function(path) { con <- file(path, open="rt", encoding="latin1") text <- readLines(con) # The message always begins after the first full line break msg <- text[seq(which(text=="")[1]+1,length(text),1)] close(con) return(paste(msg, collapse="\n")) } If "latin1" is changed to "native.enc", then this error stops, and the all.spam <- sapply(spam.docs, function(p) get.msg(paste(spam.path,p,sep="/"))) works. Note that there is a second typo in the above line as well, reported in a separate errata. (sep = "/" not "")	Lorien Pratt	Apr 26, 2012
PDF	Page 80 United States	This line consistently throws error: > con <- file("datasets/spam/", open="rt", encoding="native.enc") Error in file("datasets/spam/", open = "rt", encoding = "native.enc") : cannot open the connection In addition: Warning message: In file("datasets/spam/", open = "rt", encoding = "native.enc") : cannot open file 'datasets/spam/': Permission denied But if I try to read the individual files from the same folder, it works: > con <- file("datasets/spam/spam.email.txt", open="rt", encoding="native.enc") The above command works indicating that there are no permission issues on the folder or the file. Please help me understand what is going on here.	Kingshuk Chatterjee	Oct 31, 2012
Printed	Page 80 ff -	Some of the files in the spam folder you provided (no. 263, 320, 323, and 324) contain characters like \202, \203, etc. If I don't remove these, get.msg will crash.	Martin Schader	Feb 02, 2013
Printed	Page 81 1st paragraph	When I run the following code: spam.docs <- dir(spam.path) spam.docs <- spam.docs[which(spam.docs != "cmds")] all.spam <- sapply(spam.docs, function(p) get.msg(file.path(spam.path, p))) I get the following error: Error in seq.default(which(text == "")[1] + 1, length(text), 1) : wrong sign in 'by' argument	Paul Reiners	Feb 24, 2012
	81 secdond paragraph	all.spam <- sapply(spam.docs, function(p) get.msg(paste(spam.path,p,sep=""))) should read: all.spam <- sapply(spam.docs, function(p) get.msg(paste(spam.path,p,sep="/")))	Lorien Pratt	Apr 26, 2012
Printed	Page 82 top of page	To create the TDM, the options stopwords=TRUE and minDocFreq=2 are used. But the resulting TDM includes stopwords and terms with frequency of 1. The options for removePunctuation and removeNumbers appear to work properly. It also happens with the code supplied with the book, not just the code printed in the book. Is this an error in the package or the book?	Anonymous	Dec 25, 2012
Printed	Page 82 Top of page, 3rd and 4th line of code	The stopwords are still showing up in spam.df. Someone else also posted this in December. Any news?	Dave Gilsdorf	Mar 21, 2013
Printed	Page 82 get.tdm	In recent versions of the tm package minDocFreq=2 has been replaced by bounds = list(global = c(2,Inf)). See https://stackoverflow.com/questions/16287546/trying-to-remove-words-from-a-documenttermmatrix-in-order-to-use-topicmodels	ifernando	Feb 21, 2018
Printed	Page 83 4th par	When I user the data you provide on this website and compute spam.df, the result is term frequency density occurrence 7135 email 741 0.006365378 0.530 17371 please 388 0.003333018 0.476 13596 list 392 0.003367379 0.424 2765 body 362 0.003109672 0.402 10623 html 392 0.003367379 0.380 8666 free 495 0.004252175 0.360	Martin Schader	Feb 03, 2013
Printed	Page 84 4th par	When I compute easyham.df with the data you provided, the result is term frequency density occurrence 12731 wrote 237 0.004275894 0.378 6835 list 246 0.004438270 0.364 4888 group 196 0.003536183 0.348 11092 subject 155 0.002796471 0.256 11603 time 175 0.003157306 0.252 3550 email 174 0.003139264 0.250	Martin Schader	Feb 03, 2013
	87 First paragraph of text	"grey shaded area of Figure 3-3" should read "dark blue (center) shaded area of Figure 3-4".	Lorien Pratt	Apr 28, 2012
	87 First paragraph of text	"as depicted in Figure 3-3." should read "as depicted in Figure 3-4."	Lorien Pratt	Apr 28, 2012
Printed	Page 87 2nd par and 4th par	par 2: Figure 3-3 shoud be replaced by Figure 3-4 (twice) and par 4: less than zero should be replaced by less than one	Martin Schader	Feb 03, 2013
Printed	Page 87 First (only) code block	The constant c is exponentiation (^) two times, when it should be multiplied (*). The narrative below (the 3rd and 4th paragraphs) indicate that a product is being obtained and multiplication seems to be more logical than exponentiation.	Anonymous	Feb 28, 2013
PDF	Page 87 Code	In classify.email R rounds to zero the product of probabilities of a long term lists. I solved this issue using log transformation: ClasifyEmail <- function(path, training.df, prior=0.5, c=1e-6) { msg <- GetMsg(path) msg.tdm <- GetTDM(msg) msg.matrix <- as.matrix(msg.tdm) msg.freq <- rowSums(msg.matrix) # Find intersection of words msg.match <- intersect(names(msg.freq), training.df$term) # Compute probabilities of the unseen terms unseen.probs <- priorc^(length((msg.freq))-length(msg.match)) unseen.probs.log <- log(prior)+(length((msg.freq))-length(msg.match))log(c) if (length(msg.match) < 1) { return(unseen.probs) } else { # Search matched terms probs match.probs <- training.df$occurrence[match(msg.match,training.df$term)] # Compute probability of occurrence ot the terms # Add probabilities of the unseen terms prob <- unseen.probs*prod(match.probs) prob.log <- unseen.probs.log + sum(log(match.probs)) return (prob.log) } }	ifernando	Feb 26, 2018
	88 First code block	replace sep="" in two places in this code with sep="/"	Lorien Pratt	Apr 28, 2012
Printed	Page 88 first code snippet	hardham.res <- ifelse(hardham.spamtest > hardham.hamtest, TRUE, FALSE) should be replaced by hardham.res <- ifelse(hardham.spamtest > hardham.hamtest, FALSE, TRUE)	Martin Schader	Feb 03, 2013
	100 footnote	"are not acting" should be "are not more likely to act"	Lorien Pratt	Apr 30, 2012
PDF	Page 105 code	date <- msg.vec[date.grep[1]] should be grepl	Anonymous	Mar 01, 2012
	105 code block at bottom of page	easyham.parse <- lapply(easyham.docs, function(p) parse.email(paste(easyham.path, p, sep=""))) should read: easyham.parse <- lapply(easyham.docs, function(p) parse.email(paste(easyham.path, p, sep="/")))	Lorien Pratt	May 04, 2012
PDF	Page 105 First code sample - get.date	The get.date function fails because the second line of the function says (note the 'l' after 'grep'): date.grepl <- which(date.grep == TRUE) and the third line says (note the missing 'l'): date <- msg.vec[date.grep[1]] The sample code uses date.grep for both the second and third line.	Maymount	Jun 25, 2013
Mobi	Page 106 3rd paragraph	When defining the parameters for the strptime function, it would be helpful to point out that these return information like abbreviated Weekdays or Months in the current locale of the machine. This means if you are not a native speaker of English and you have configured your machine to talk in your native language, you are running the risk of getting lots of NAs when running that function on English emails. Sys.setLocale('LC_TIME', 'en_US') did it for me.	Thomas Prosser	Dec 17, 2013
	108 First code block	As with the error reported for Page 80, the encoding should be "native.enc", not "latin1", otherwise this generates an error message. It occurs to me that this errata, as with the page 80 one, may only apply on one type of machine, as other machines may not generate this error, and "latin1" may be correct there. I generated this error on a Windows 7 64 bit computer.	Lorien Pratt	May 04, 2012
PDF	Page 108 2nd paragraph	>from.weight <- ddply(priority.train, .(From.EMail),summarise, Freq = length(Subject)) The above code gives the error below: Error in attributes(out) <- attributes(col) : 'names' attribute [9] must be the same length as the vector [1] After checking online on stackoverflow, I found that converting the Date feature in the priority.train from a POSIXlt object to a POSIXct object before running the above code solves the problem. i.e. priority.train$Date <- as.POSIXct(priority.train$Date)	David	Mar 21, 2013
Printed	Page 114 1st code block	R 3.2.5 user here. In the first line of the `thread.counts` function, the call to the `paste` function uses the default argument `sep=" "` because the `sep` argument is not supplied, so an unwanted space is introduced between the string "re: " and the subject line during comparison. The result is that most threads will not be found. The solution is to use supply `sep=""` to the `paste` call. So the corrected line of code should be: thread.times <- email.df$Date[which(email.df$Subject == thread \| email.df$Subject == paste("re:", thread, sep=""))]	Anonymous	Apr 18, 2016
Printed	Page 114 1st code block	Sorry, I submitted the errata above but missed out a space in the corrected line of code above for the "re: " string. Correct code should be: thread.times <- email.df$Date[which(email.df$Subject == thread \| email.df$Subject == paste("re: ", thread, sep=""))]	Anonymous	Apr 18, 2016
Printed	Page 141 Code block half way down	R^2 is calculated as 1 - (model.rmse / mean.rmse), but these values should be MSEs rather than RMSEs. Source: http://en.wikipedia.org/wiki/Coefficient_of_determination defines R^2 using MSEs (it doesn't actually take means, but the divisions cancel); and the R^2 reported by `summary(fitted.regression)` gives the same value as calculating it using MSEs but not RMSEs.	Phil Hazelden	Dec 12, 2012
ePub	Page 142 First paragraph of Chapter 3	The first paragraph of Chapter 3 refers to Example 3-1 as a dataset on health and ailments, but Example 3-1 is an email header for spam classification. The second paragraph also mentions blue and red dots, but there are no blue or red dots. Figure/Example 3-1 references are mismatched/missing.	Roy C	Feb 05, 2014
Printed	Page 150 2nd paragraph	The code in the 2nd sentence of the 2nd paragraph "sqrt(mean(residuals(lm.fit) ^ 2))" should be replaced by "sqrt((sum(residuals(lm.fit) ^ 2)) / 998)". The Residual Standard Error does not strictly use the mean of the squared residuals but rather the sum of the squared residuals divided by n - p (in this case 998), where p is the number of predictors in your model including intercept.	Clay Ford	Nov 24, 2012
Printed	Page 152 1st code block	For `summary(lm.fit)$r.squared` done on `lm.fit <- lm(log(PageViews) ~ InEnglish, data=top.1000.sites)`, I got 0.3043425 instead of 0.03122206	Anonymous	Apr 21, 2016
	156 figure at top of the page	figure should include labels (a), (b), (c), and (d) to match caption and text.	Lorien Pratt	May 05, 2012
Mobi	Page 169 code block	This function: get.tdm <- function(doc.vec) { doc.corpus <- Corpus(VectorSource(doc.vec)) control <- list(stopwords=TRUE, removePunctuation=TRUE, removeNumbers=TRUE, minDocFreq=2) doc.dtm <- TermDocumentMatrix(doc.corpus, control) return(doc.dtm) } trhows this error: "Error in x$nrow : $ operator is invalid for atomic vectors"; specifically when calling TermDocumentMatrix.	rocjoe	Sep 15, 2017
Printed	Page 170 Last code block	I'm using R 3.2.5 with glmnet 2.0-5 Doing: x <- matrix(x) library(glmnet) glmnet(x, y) gives the following error: Error in glmnet(x, y) : x should be a matrix with 2 or more columns In this case, `x` should be a matrix with 2 columns. A matrix with the first and second column both being the original `x` vector works: x <- as.matrix(cbind(x, x)) library(glmnet) glmnet(x, y)	Anonymous	Apr 24, 2016
Printed	Page 175 1st codeblock	I'm using R 3.2.5 with tm 0.6-2 This line of code: corpus <- tm_map(corpus, tolower) will cause the following error: Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "character" In addition: Warning message: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code when this line of code is run: dtm <- DocumentTermMatrix(corpus) It turns out that we have to use this instead: corpus <- tm_map(corpus, content_transformer(tolower)) More details can be found here: http://stackoverflow.com/a/24771621	Anonymous	Apr 25, 2016
Printed	Page 183 Last paragraph and top of following page	"In this example, the a parameter is the slope of the line and the b parameter is the intercept" disagrees with the preceding code snippet and following paragraphs. "a" and "b" need swapping for it to be correct.	Jonathan Hammler	Sep 14, 2012
	186 first paragraph	"another a second" should be "a second"	Lorien Pratt	May 07, 2012
Printed	Page 200 Second paragraph	"That value turns out to be not to be numerically unstable." should read "That value turns out not to be numerically unstable" or "That value turns out to be numerically stable."	Jonathan Hammler	Sep 14, 2012
Printed	Page 207 .	While reading ch.8, page 207, I wondered why the percentages of variance added up to more than 100%. I later found time to check with some experts: http://stats.stackexchange.com/q/32901/5503 and got confirmation that the text is incorrect. The quoted paragraph could be changed to: In this summary, the standard deviations tell us how much of the variance in the data set is accounted for by the different principal components. Use summary(pca) to see the proportions of variance. The first component, called Comp.1, accounts for 46% of the variance, while the next component accounts for another 22.7%. By the end, the last component, Comp.24, accounts for a mere 0.01% of the variance. This suggests that we can learn a lot about our data by just looking at the first principal component.	Anonymous	Jul 26, 2012
Printed	Page 207 2nd code block	R 3.2.5 user here. This line of code: opts(legend.position="none") results in the following error: Error in eval(expr, envir, enclos) : could not find function "opts" From what I read here: http://mfcovington.github.io/r_club/errata/2013/03/05/ch5-errata/ the `opts` function is deprecated. Changing that line of code to: theme(legend.position="none") works. Source: http://stackoverflow.com/a/19821839	Anonymous	Apr 27, 2016
	212 Last code section	First two lines of code should read: comparison <- transform(comparison, MarketIndex = scale(MarketIndex)) comparison <- transform(comparison, DJI = scale(DJI))	Lorien Pratt	May 07, 2012
	216 Last paragraph	"products 2 and 3" should read "products 2 and 4"	Lorien Pratt	May 07, 2012
	218 bottom of figure	The table at the bottom of figure 9-1 should have rows titled A, B, C, D, not P1, P2, P3, P4	Lorien Pratt	May 07, 2012
Printed	Page 219 p. 219 ff.	What's the reason for invoking dist() on ex.mult and not on ex.matrix? This will distort your scaling results. You do the same with the Roll Call data, p.227.	Martin Schader	Feb 21, 2013
Other Digital Version	223 3rd line	prices <- transform(prices, Date = ymd(Date) Should be prices <- transform(prices, Date = ymd(as.character(Date))	Anonymous	Jun 19, 2013
	224 Code at top of page	sep="" in the second line on the page should be sep="/" (at least on the Windows 7 machine on which I am testing)	Lorien Pratt	May 11, 2012
	224 Final text paragraph	"column are" should read "column names are"	Lorien Pratt	May 11, 2012
Printed	Page 242 p. 242 source code	Interesting that you recommend ten packages that the user (no. 1) has already installed. Perhaps you should first remove the installed packages from the vector "listing".	Martin Schader	Mar 02, 2013
Printed	Page 250 Google SocialGraph API box	"supplemental files of the book that were generated by this code before the SocialGraph API occurred." should be "supplemental files of the book that were generated by this code before the change to SocialGraph API occurred."	Jonathan Hammler	Sep 14, 2012
Printed	Page 252 First paragraph (after code)	URLs should be split by a slash, not a backslash as stated. The code listing is correct, but the text is not.	Jonathan Hammler	Sep 14, 2012
Printed	Page 258 Bottom of page	Closer nodes are described as having "less hops between them". They should, of course, have "fewer hops".	Anonymous	Sep 14, 2012
Printed	Page 280 Graph	The authors apparently thought they were developing graphs for a colored media. The printed books, however, are black and white. Many of the graphs, such as the one on 276, 280, 281 etc., have two types of circles. One of which is rendered in grey, and the other in... grey. This makes the graphs useless. And it's not just one graph, is all through the book. Some graphs use symbols, and are ok. Others are grey vs. grey, and of no use at all. Another example is on page 265. The graph is labelled "Drew Conway ego-network colored by local community structure." The graph is monochromatic. This is severe enough that I think I'm going to return the book, something I've never done with an O'Reilly title.	Anonymous	Mar 20, 2012