Errata

Errata for Doing Data Science

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
Printed	Page 38-39	Hello. In the data file for the first exercise, all of the records with SIGNED_IN==FALSE have the value 0 for both the Age and Gender fields. These would seem to be meaningless default values, and in the sample code there is a line to produce a summary of the data that ignores them: summaryBy(Gender+Signed_In+Impressions+Clicks~agecat, data=data1) But what the sample code doesn't seem to do is to replace these placeholder values with either nulls or with values that are less likely to be mistaken for good data (e.g., -99). It seems confusing to have a column where 0 means female, unless some other column is also 0, in which case it means unknown. My instinct would be to clean that up, but the fact that you don't seem to suggest doing so here makes me wonder if there is a reason not to take this approach. Thanks.	George Schneiderman	Jun 17, 2016
Printed	Page 39 in the code	On page 39 there is code that is supposed to cut users as "<18", "18-24", ... etc. The code given is: data1$agecat<- cut(data1$Age,c(-Inf,0,18,24,34,44,54,64,Inf)) The intervals created from this code are: (-Inf, 0], (0, 18], (18,24], ... etc. The problem is that 18 is included in the under 18 group using this code. Also, 0 is separated from the other users who are under 18.	Jason Scott	Aug 13, 2015
Printed	Page 39 Near top of code segment	On p. 38 the task is to separate users by age into <18, 18-24, 25-34, 35-44, 45-54, 55-64 and 65+. The sample code creates field agecat with this line: data1$agecat<-cut(data1$Age,c(-Inf,0,18,24,34,44,54,64,Inf)) But this sorts users into categories < 19, 19-24, 25-34, etc. The code should be data1$agecat<-cut(data1$Age,c(-Inf,0,19,24,34,44,54,64,Inf))	JD Baldwin	Sep 23, 2017
Printed	Page 50 4th statement from the bottom	bk.homes[which(bk.homes$sale.price.n<100000),] [order(bk.homes[which(bk.homes$sale.price.n<100000),] $sale.price.n),] throws the following error: 21769 1440 1250 21785 2740 1962 22760 2300 1283 23098 2080 2003 23117 2550 1800 > [order(bk.homes[which(bk.homes$sale.price.n<100000),] Error: unexpected '[' in " ["	Mahboob Hussain	Sep 19, 2015
Printed	Page 86 Last line	> require(geoPlot) Loading required package: geoPlot Warning message: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘geoPlot’	Mahboob Hussain	Sep 23, 2015
Printed	Page 87 12th line from bottom	> mt$address.noapt <- gsub("[,][[:print:]]", "", + gsub(("[ ]+", " ", trim(mt$address))) Error: unexpected ',' in: "mt$address.noapt <- gsub("[,][[:print:]]", "", gsub(("[ ]+","	Mahboob Hussain	Sep 23, 2015
PDF	Page 111 Sample R Code for Dealing with the NYT API	The New York Times API has changed. Thus nearly everything in the code sample must be reworked. e.g. res1$results becomes res1$response$docs	Steven	Dec 08, 2015
Printed	Page 126 3rd line	broken link. Article is no more to be found for free.	Buno Betoni Parodi	Jan 06, 2018
Printed	Page 162 Code Snipit	datapath <- "http://getglue-data.s3.amazonaws.com/getglue_sample.tar.gz" Does this path still exist? If not... Can we have the dataset included at the following URL? https://github.com/oreillymedia/doing_data_science	Anonymous	Jun 22, 2017