Errata

R for Data Science

Errata for R for Data Science

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
PDF, ePub Page Section 4.4
Immediately under the heading, 4.4

Epub Section 4.4 Practice (or page 40 of PDF) reads as follows:

1. Why does this code not work?

my_variable <- 10
my_varıable

#> Error in eval(expr, envir, enclos): object 'my_varıable' not found

In fact, the code runs fine as shown in my output here:

> my_variable <- 10
> my_varıable
[1] 10

Question: What coding error do the authors intend to highlight here?

Kurt Schulzke  Jan 16, 2018 
ePub, Page n/a
text

Current Copy
May Jun Jul Aug Sep Oct Nov Dec If you want a want an error, you can use readr::parse_factor(): y2 <- parse_factor(x2, levels = month_levels) #> Warning:

Suggested
"Dec If you want a want an error, you can " should be "Dec If you want an error, you can "

Anonymous  May 04, 2022 
ePub Page text
726

Notes from Amazon
"Your book has an external link that does not work "rstudio[dot]com/cheatsheets." at location 726, 11617. Please update a valid external URL. To ensure future access to reference material, Amazon strongly recommends submitting these types of links to an archive service, and including the archived link in the book. If the link is broken due to forces outside your control, it should be deactivated and "[URL inactive]" should be added following the link text."

Anonymous  Jul 14, 2022 
ePub Page text
12620, 3053, 12679

Notes from Amazon
"Your book has an external link that does not work "ggplot2 book" at location 12620, 3053, 12679. Please update a valid external URL. To ensure future access to reference material, Amazon strongly recommends submitting these types of links to an archive service, and including the archived link in the book. If the link is broken due to forces outside your control, it should be deactivated and "[URL inactive]" should be added following the link text."

Anonymous  Jul 14, 2022 
ePub Page text
4618

Notes from Amazon
"Your book has an external link that does not work "thoughtful blog post by Jeff Leek." at location 4618. Please update a valid external URL. To ensure future access to reference material, Amazon strongly recommends submitting these types of links to an archive service, and including the archived link in the book. If the link is broken due to forces outside your control, it should be deactivated and "[URL inactive]" should be added following the link text."

Anonymous  Jul 14, 2022 
ePub Page text
10887

Notes from Amazon
"Your book has an external link that does not work "Statistical Modeling: A Fresh Approach" at location 10887. Please update a valid external URL. To ensure future access to reference material, Amazon strongly recommends submitting these types of links to an archive service, and including the archived link in the book. If the link is broken due to forces outside your control, it should be deactivated and "[URL inactive]" should be added following the link text."

Anonymous  Jul 14, 2022 
ePub Page 1
Chapter 1

By default my python3
Did not have tidyverse
Nor was there an explanation
How to get for true beginners

James Patrick Cruse  May 20, 2019 
Printed Page 4
4th paragraph

This is not an error in the book but the authors said technical questions could be asked my emailing bookquestions@oreilly.com, which I did, and got a message back to post here the issue that I'm having.

As per the suggestion in the book, I tried install tidyverse: install.packages(“tidyverse”). After running ‘library(tidyverse)’ it gives message saying there’s no package ‘tidyverse’. Some of the warnings are:

1. ‘Permission denied’ at the end of a long string (it is too lengthy to include here), and
2. Installation of package 'gargle' had non-zero exit status.

If you can let me know how to resolve this issue, I'd very much appreciate it. Thanks.

Anonymous  Oct 04, 2022 
PDF Page 6
Second item in "Exercises"

"mtcars" should be replaced by "mpg"

Anonymous  Jul 26, 2017 
Printed Page 10
2nd paragraph, 1st sentence

Misplaced comma or missing verb.

David Emerson Feit  Mar 24, 2018 
Printed Page 11

Page 11 last paragraph, line 5 .
It is: " the solid shapes (15-18) are filled with color."
It should be: "the solid shapes (15-20) are filled with color."

Vahid  Feb 10, 2018 
Printed Page 16
Figures

Plots on Page 16 are not the intended graphs.
Left plot should be a scatter plot
Right plot should be the one printed in the left hand position.

Mathew Ling  Jan 13, 2017 
Printed Page 16
graphs and r code

The R code at the bottom of the page does not generate the graphs. The code labeled "#right" should be "#left" and the code labeled #left generates a graph that was printed on an earlier page.

The graph on the right seems to be a duplicate of the graph on page 17.

Robert N. Bernard  Jan 31, 2017 
Printed Page 16
two figures

In the printed version the two charts below the title 'Geometric objects' are not the ones described in the text. It should be a scatter plot (left) and the smoothed plot (right, only one smoothing line).
The code (bottom of page 16) describing the generation of the the two plots is correct.

The online-version is correct, see
http://r4ds.had.co.nz/data-visualisation.html#geometric-objects

Tinu Schneider  Feb 11, 2017 
Printed Page 16
Figure + last paragraph

Plots are not output from code in last paragraph:

... to make the preceding plots, you can use this code:

# left
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))

# right
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))

2 possible corrections :

1) change plots: on line version on http://r4ds.had.co.nz/data-visualisation.html 3.6 shows correct plots

2) change code :

# left
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))

# right
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))

Robbie Heremans  Feb 11, 2017 
Printed Page 16

On p. 16, under the title of Geometric Objects, there are two plots. At the bottom of the page are two lines of R code to describe/create them. The R code marked # left does not generate the graph on the left. The R code (the second line) marked # right generates that first graph. The graph on the right looks like it's the same as the graph on the next page. The first graph should look like the graph on p. 5.

Venita Hagerty  Feb 22, 2017 
Printed Page 16
Graphics in Geometric Objects section

The graphic under Geometric Objects section is wrong in the printed version (1st edition, 2nd release)

Jordi Carbó  Sep 05, 2017 
Printed Page 16
the two plots

the two plots displayed are not those that the text refers to. The text refers to a point geom on the left and a smooth geom on the right. Both are smooth, one has three levels. Code at the bottom of the page generates simple point and smooth geom plots not the one on the right...

Anonymous  Dec 01, 2021 
Printed Page 16
bottom

In the fourth release, the code labeled "right" produces the graph on the left and the code labeled "left" produces the graph on page 5. The right figure on page 16 is produced by the code on page 17.

Gary Rosenberg  Feb 04, 2022 
Printed Page 22
Immediately below the graph

(Presumably) erroneous apostrophe before the word variable.

David  Sep 28, 2018 
Printed Page 24
last paragraph

"... haven't seen <- or tibble() before ..."

The code on this page references "tribble" not "tibble"

Robert N. Bernard  Jan 31, 2017 
Printed Page 24
first block of code

in the fourth release, the code at this point says "tribble" rather than "tibble".

Gary Rosenberg  Feb 04, 2022 
Printed Page 25
code at bottom

Using "fun.ymin", "fun.ymax" and "fun.y" succeeds, but returns an error message that they are deprecated.

Gary Rosenberg  Feb 04, 2022 
, Printed Page 28
First bullet point

The two plots showing the use of position = "identity" are missing.

Sbusiso  Jan 31, 2017 
Printed Page 47
Last Paragraph

The 7 Venn diagram graphics in the "Logical Operators" paragraph are not shaded at all, so they all look identical. They are all supposed to have unique shadings to indicate the 7 different Boolean operations.

This book is great, you guys did a fantastic job from what i can see so far, keep up the good work!!!

John Swicegood  Jan 26, 2017 
Printed Page 48
1st paragraph

The code
filter(flights, month == 11 | 12)

doesn't find all flights in January, but all flights in the year!

equality operator == has higher precedence than or operator |
you must read (month==1) | 12 not month == (11|12)

Federico Puglia  Aug 31, 2018 
PDF Page 60
1st glyph

With ggplot2 version 3.2.1, an additional legend is produced for geom_smooth() relating to the alpha value (= 1/3 in the text). This legend was not shown on the glyph, but was produced following the code fragment just above the glyph on the PDF version. To shut this extraneous legend off, add the line "+ scale_alpha(guide = FALSE)" within the ggplot call. This is more of a note than a mistake.

Gerry Ho  Feb 02, 2020 
Printed Page 100
First plot on the page

On page 100, the plot shown is not the correct one as the plot created before is called again (reorder(class, hwy,FUN = median) and the correct plot should show the geom_count graph.

Sbusiso Mkhondwane   Feb 14, 2017 
Printed Page 100
Graphics in Two Categorical Variables

The graphic under "Graphics in Two Categorical Variables" section is wrong in the printed version (1st edition, 2nd release).
It is repeating the previous graphic.

The online version is correct

Jordi Carbó  Sep 05, 2017 
Printed Page 100
upper part

On page 100, the plot shown is not the correct one as the plot created before (pag. 98)
is called again.
1st edition, 2nd release. Recently bought online (April 2020)

Simone  Apr 30, 2020 
Printed Page 100
Top of page

Plot at top of the page is incorrect - it repeats previous graphic on page 98.
Error has been reported three times but remains unconfirmed: 1st edition, 2nd release (Sep 05 2017), Feb 14 2017, and Apr 30 2020 - this error persists in the 2nd edition 4th release.
Online version is correct.

Denise Howting  May 20, 2021 
Printed Page 100
Top

Incorrect chart at top of page. Chart from page 98 has been duplicated here. A different type a chart required

Amanda burke  Jan 20, 2023 
Printed Page 112
first bullet item

Pressing the Cmd/Ctrl-Shift-F10 key restarts the background R session, not the full Rstudio GUI.

James Winget  Mar 17, 2017 
Other Digital Version 130
Paragraph on number of parsers.

In the Kindle version, the text reads, "There are eight particularly important parsers:"

I count nine.

Steven Slezak  Dec 09, 2017 
Printed Page 132
Beginning of page

The text uses the "parse_number()" function from the "readr" package. But the actual function name is (now) "parse_numeric".

James Winget  Mar 17, 2017 
Printed Page 132
Third example of "parse_number()"

The third example of "parse_number()" (which should be parse_numeric()) shows a final result of 123. The correct result is 123.45.

James Winget  Mar 17, 2017 
Printed Page 151
Figure on top of p. 151

The years are repeated on the x-axis, with 1999 appearing twice and 2000 appearing three times. This behavior no longer appears as of dplyr 0.7.2 and ggplot2 2.2.1, but even now the default x-axis labeling could be improved by treating year as a factor or by explicitly labeling 1999 and 2000 on the x-axis.

Eric Lawrence  Aug 17, 2017 
Printed Page 155
Figure 9-3

The heading of "table2" in Figure 9-3 shows "key" and "value." But these should really be "type" and "count".

James Winget  Mar 20, 2017 
Printed Page 156
Exercise #3

"How could you add a new column..." should really be "How can you add a new row..."

James Winget  Mar 20, 2017 
Printed Page 160
Figure 9-5

The result of the unite expression is shown with a heading "year" but the code snippets would yield a heading "new".

James Winget  Mar 20, 2017 
Printed Page 175
6th paragraph

"appears in the flights table" should be "appears in the planes table", in the sentence:

For example, the flights$tailnum is a foreign key because it appears in the flights table where it matches each flight to a unique plane.

Li Chao  Jun 24, 2019 
Printed Page 182
Venn Diagram

The shading is missing from the Venn diagram making it somewhat less informative :-)

James Winget  Mar 20, 2017 
Printed Page 184
Top

“y4” is not present in the 2nd table. “y4” should be “y3” in joined figure and text output.

Michelle White  Oct 11, 2021 
Printed Page 196
3rd paragraph

"Beware that the printed representation of a string is not the same as [the] string itself, because the printed representation shows the escapes. To see the raw contents of the string, use writeLines():"

This text is seriously confused. Leaving aside the missing 'the',the printed representation shows the effect of the escape sequences, but not the sequences themselves.

I've run the R code and it functions as reported. However, how is the printed representation after the effect of the escape sequences, called a 'raw' content?

Francis King  Apr 01, 2019 
Printed Page 200-221
throughout

The shading in the book is virtually non-existent, so it's impossible to see the results of str_view() on the printed pages.

Valerie Partridge  Mar 21, 2020 
Printed Page 212
Third block of text from the bottom

The example of str_extract uses the highlighted block of text from str_view_all not from str_view.

James Winget  Mar 20, 2017 
Other Digital Version 219
top of page (kindle version)

phone <- regex(

6th hash mark has code for last three numbers
but last section in the phone number is four numbers

code should read:

(\\d{4}) #four more numbers

Steven Slezak  Dec 23, 2017 
Other Digital Version 224
top of page (kindle version)

Line reads: "If you want a want an error..."

Steven Slezak  Dec 24, 2017 
Printed Page 231
Just after first code block

The example plots are missing that (would) show the value of "fct_reorder2()".

James Winget  Mar 20, 2017 
Printed Page 231
1st block of code

This code ...
by_age <- gss_cat %>%
filter(!is.na(age)) %>%
group_by(age, marital) %>%
count() %>%
mutate(prop = n / sum(n))

results in each row having a prop value of 1. Perhaps this is a new bug in R 3.4.1 or dplyr

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] bindrcpp_0.2 dplyr_0.7.2 purrr_0.2.2.2 readr_1.1.1 tidyr_0.6.3
[6] tibble_1.3.3 ggplot2_2.2.1 tidyverse_1.1.1 forcats_0.2.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 cellranger_1.1.0 compiler_3.4.1 plyr_1.8.4 bindr_0.1
[6] tools_3.4.1 digest_0.6.12 jsonlite_1.5 lubridate_1.6.0 nlme_3.1-131
[11] gtable_0.2.0 lattice_0.20-35 pkgconfig_2.0.1 rlang_0.1.1 psych_1.7.5
[16] parallel_3.4.1 haven_1.1.0 xml2_1.1.1 stringr_1.2.0 httr_1.2.1
[21] hms_0.3 grid_3.4.1 glue_1.1.1 R6_2.2.2 readxl_1.0.0
[26] foreign_0.8-69 modelr_0.1.1 reshape2_1.4.2 magrittr_1.5 scales_0.4.1
[31] rvest_0.3.2 assertthat_0.2.0 mnormt_1.5-5 colorspace_1.3-2 labeling_0.3
[36] stringi_1.1.5 lazyeval_0.2.0 munsell_0.4.3 broom_0.4.2
>

Anonymous  Jul 26, 2017 
Printed Page 246
Paragraph after "Rounding"

The second sentence starts with "Each ceiling_date()function". In addition to the spacing problems, rather than ceiling_date() this perhaps should be "*_date()" to indicate a match to any of the three?

James Winget  Mar 20, 2017 
Printed Page 270
Prerequisites section

Technically, it should be "so you won't need to load any extra packages."

The very first code example uses the tibble package (without loading it).

James Winget  Mar 21, 2017 
Printed Page 288
Last sentence on page

Suggests that "{" might have unexpected functional behavior but fails to provide an example.

James Winget  Mar 21, 2017 
Other Digital Version 332
graphic (Kindle version)

for sigma<- and mu<- objects, text uses "n = 5" but the graphic illustrating the idea uses "n = 10".

Steven Slezak  Jan 01, 2018 
Other Digital Version 334
graphic (Kindle version)

the graphic refers to an object called "params" but the object is called "param" in the text. On page 335 there is a similar problem. "Params" is used but I think they mean "param." It may not matter in this case.

Steven Slezak  Jan 01, 2018 
Other Digital Version 380
bottom

The code at the bottom of the page for "grid" is just wrong. Specifically, the ".model = mod_diamonds2" argument. A correction can be found here: https://github.com/tidyverse/modelr/issues/58

Steven Slezak  Jan 03, 2018 
PDF Page 442
Code for the plot

In the book, the code for creating the plot is this one:

ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(
title = paste(
"Fuel efficiency generally decreases with"
"engine size"
)

This code generates an error, the correct code is this:

ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(
title = paste(
"Fuel efficiency generally decreases with",
"engine size"
)
)

Roberto Rezende de Assis  Oct 21, 2017 
PDF Page 443
Code for the plot

Minor error in the code for the plot:

ORIGINAL:
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(
title = paste(
"Fuel efficiency generally decreases with"
"engine size",
)
subtitle = paste(
"Two seaters (sports cars) are an exception"
"because of their light weight",
)
caption = "Data from fueleconomy.gov"
)

CORRECT:
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(
title = paste(
"Fuel efficiency generally decreases with",
"engine size"
),
subtitle = paste(
"Two seaters (sports cars) are an exception",
"because of their light weight"
),
caption = "Data from fueleconomy.gov"
)

Roberto Rezende de Assis  Oct 21, 2017 
Printed Page 460
After first full paragraph.

The two graphs in the viridis example are not rendered in the text.

Eric Lawrence  Aug 17, 2017 
Other Digital Version 469
middle (kindle version)

the command
?rmarkdown:html_document() should use double colons :: like

?rmarkdown::html_document()

Steven Slezak  Jan 06, 2018 
ePub Page 550
Geometric Objects Wickham, Hadley; Grolemund, Garrett. R for Data Science . O'Reilly Media. Kindle-Version.

(550 is the position on my Kindle)


# left
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))

should be changed to: geom_smooth


best regards,
rh

Reiner Hutwelker  Dec 21, 2021 
ePub Page 9999
Section 'Creating Factors' after 6th code example.

Existing text (duplicated 'want'):
If you want a want an error, you can use readr::parse_factor():

Suggest instead:
If you want an error, you can use readr::parse_factor():

N.B. I noticed this in the O'Reilly online version of the text which does not seem to have page numbers, so the page number I reported for this error (9999) is bogus.



Alex Copeland  Dec 01, 2021 
ePub Page 9999
Chapter 13 'Dates and Times with lubridate', section 'Periods', code example 6

Existing code does not appear to change the arr_time as described (using R-4.1.2):

flights_dt <- flights_dt %>%
mutate(
overnight = arr_time < dep_time,
arr_time = arr_time + days(overnight * 1),
sched_arr_time = sched_arr_time + days(overnight * 1)
)
) %>% select(origin,dest,arr_time,dep_time)

origin dest arr_time dep_time
<chr> <chr> <dttm> <dttm>
1 EWR IAH 2013-01-01 08:30:00 2013-01-01 05:17:00
2 LGA IAH 2013-01-01 08:50:00 2013-01-01 05:33:00
3 JFK MIA 2013-01-01 09:23:00 2013-01-01 05:42:00

If we change 'days(overnight*1)' to 'days(overnight+1)' , the output looks correct:

> flights_dt %>%
mutate(
overnight = arr_time < dep_time,
arr_time = arr_time + days(overnight + 1),
sched_arr_time = sched_arr_time + days(overnight + 1)
) %>% select(origin,dest,arr_time,dep_time) %>% head(3)

# A tibble: 3 × 4
origin dest arr_time dep_time
<chr> <chr> <dttm> <dttm>
1 EWR IAH 2013-01-02 08:30:00 2013-01-01 05:17:00
2 LGA IAH 2013-01-02 08:50:00 2013-01-01 05:33:00
3 JFK MIA 2013-01-02 09:23:00 2013-01-01 05:42:00

Alex Copeland  Dec 01, 2021 
Mobi Page 12620
text

Current Copy
May Jun Jul Aug Sep Oct Nov Dec If you want a want an error, you can use readr::parse_factor(): y2 <- parse_factor(x2, levels = month_levels) #> Warning:


Should be:
...Dec If you want an error, you can...

Anonymous  Jun 01, 2020