O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

R for Everyone: Advanced Analytics and Graphics, 2nd Edition

Book Description

Statistical Computation for Programmers, Scientists, Quants, Excel Users, and Other Professionals

Using the open source R language, you can build powerful statistical models to answer many of your most challenging questions. R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone, Second Edition, is the solution.

Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you'll need to accomplish 80 percent of modern data tasks.

Lander's self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You'll download and install R; navigate and use the R environment; master basic program control, data import, manipulation, and visualization; and walk through several essential tests. Then, building on this foundation, you'll construct several complete models, both linear and nonlinear, and use some data mining techniques. After all this you'll make your code reproducible with LaTeX, RMarkdown, and Shiny.

By the time you're done, you won't just know how to write R programs, you'll be ready to tackle the statistical problems you care about most.

Coverage includes

  • Explore R, RStudio, and R packages
  • Use R for math: variable types, vectors, calling functions, and more

  • Exploit data structures, including data.frames, matrices, and lists

  • Read many different types of data

  • Create attractive, intuitive statistical graphics

  • Write user-defined functions

  • Control program flow with if, ifelse, and complex checks

  • Improve program efficiency with group manipulations

  • Combine and reshape multiple datasets

  • Manipulate strings using R's facilities and regular expressions

  • Create normal, binomial, and Poisson probability distributions

  • Build linear, generalized linear, and nonlinear models

  • Program basic statistics: mean, standard deviation, and t-tests

  • Train machine learning models

  • Assess the quality of models and variable selection

  • Prevent overfitting and perform variable selection, using the Elastic Net and Bayesian methods

  • Analyze univariate and multivariate time series data

  • Group data via K-means and hierarchical clustering

  • Prepare reports, slideshows, and web pages with knitr

  • Display interactive data with RMarkdown and htmlwidgets

  • Implement dashboards with Shiny

  • Build reusable R packages with devtools and Rcpp

Table of Contents

  1. About This E-Book
  2. Title Page
  3. Copyright Page
  4. Dedication Page
  5. Contents
  6. Foreword
  7. Preface
  8. Acknowledgments
    1. Acknowledgments for the Second Edition
    2. Acknowledgments for the First Edition
  9. About the Author
  10. 1. Getting R
    1. 1.1 Downloading R
    2. 1.2 R Version
    3. 1.3 32-bit vs. 64-bit
    4. 1.4 Installing
      1. 1.4.1 Installing on Windows
      2. 1.4.2 Installing on Mac OS X
      3. 1.4.3 Installing on Linux
    5. 1.5 Microsoft R Open
    6. 1.6 Conclusion
  11. 2. The R Environment
    1. 2.1 Command Line Interface
    2. 2.2 RStudio
      1. 2.2.1 RStudio Projects
      2. 2.2.2 RStudio Tools
      3. 2.2.3 Git Integration
    3. 2.3 Microsoft Visual Studio
    4. 2.4 Conclusion
  12. 3. R Packages
    1. 3.1 Installing Packages
      1. 3.1.1 Uninstalling Packages
    2. 3.2 Loading Packages
      1. 3.2.1 Unloading Packages
    3. 3.3 Building a Package
    4. 3.4 Conclusion
  13. 4. Basics of R
    1. 4.1 Basic Math
    2. 4.2 Variables
      1. 4.2.1 Variable Assignment
      2. 4.2.2 Removing Variables
    3. 4.3 Data Types
      1. 4.3.1 Numeric Data
      2. 4.3.2 Character Data
      3. 4.3.3 Dates
      4. 4.3.4 Logical
    4. 4.4 Vectors
      1. 4.4.1 Vector Operations
      2. 4.4.2 Factor Vectors
    5. 4.5 Calling Functions
    6. 4.6 Function Documentation
    7. 4.7 Missing Data
      1. 4.7.1 NA
      2. 4.7.2 NULL
    8. 4.8 Pipes
    9. 4.9 Conclusion
  14. 5. Advanced Data Structures
    1. 5.1 data.frames
    2. 5.2 Lists
    3. 5.3 Matrices
    4. 5.4 Arrays
    5. 5.5 Conclusion
  15. 6. Reading Data into R
    1. 6.1 Reading CSVs
      1. 6.1.1 read_delim
      2. 6.1.2 fread
    2. 6.2 Excel Data
    3. 6.3 Reading from Databases
    4. 6.4 Data from Other Statistical Tools
    5. 6.5 R Binary Files
    6. 6.6 Data Included with R
    7. 6.7 Extract Data from Web Sites
      1. 6.7.1 Simple HTML Tables
      2. 6.7.2 Scraping Web Data
    8. 6.8 Reading JSON Data
    9. 6.9 Conclusion
  16. 7. Statistical Graphics
    1. 7.1 Base Graphics
      1. 7.1.1 Base Histograms
      2. 7.1.2 Base Scatterplot
      3. 7.1.3 Boxplots
    2. 7.2 ggplot2
      1. 7.2.1 ggplot2 Histograms and Densities
      2. 7.2.2 ggplot2 Scatterplots
      3. 7.2.3 ggplot2 Boxplots and Violins Plots
      4. 7.2.4 ggplot2 Line Graphs
      5. 7.2.5 Themes
    3. 7.3 Conclusion
  17. 8. Writing R functions
    1. 8.1 Hello, World!
    2. 8.2 Function Arguments
      1. 8.2.1 Default Arguments
      2. 8.2.2 Extra Arguments
    3. 8.3 Return Values
    4. 8.4 do.call
    5. 8.5 Conclusion
  18. 9. Control Statements
    1. 9.1 if and else
    2. 9.2 switch
    3. 9.3 ifelse
    4. 9.4 Compound Tests
    5. 9.5 Conclusion
  19. 10. Loops, the Un-R Way to Iterate
    1. 10.1 for Loops
    2. 10.2 while Loops
    3. 10.3 Controlling Loops
    4. 10.4 Conclusion
  20. 11. Group Manipulation
    1. 11.1 Apply Family
      1. 11.1.1 apply
      2. 11.1.2 lapply and sapply
      3. 11.1.3 mapply
      4. 11.1.4 Other apply Functions
    2. 11.2 aggregate
    3. 11.3 plyr
      1. 11.3.1 ddply
      2. 11.3.2 llply
      3. 11.3.3 plyr Helper Functions
      4. 11.3.4 Speed versus Convenience
    4. 11.4 data.table
      1. 11.4.1 Keys
      2. 11.4.2 data.table Aggregation
    5. 11.5 Conclusion
  21. 12. Faster Group Manipulation with dplyr
    1. 12.1 Pipes
    2. 12.2 tbl
    3. 12.3 select
    4. 12.4 filter
    5. 12.5 slice
    6. 12.6 mutate
    7. 12.7 summarize
    8. 12.8 group_by
    9. 12.9 arrange
    10. 12.10 do
    11. 12.11 dplyr with Databases
    12. 12.12 Conclusion
  22. 13. Iterating with purrr
    1. 13.1 map
    2. 13.2 map with Specified Types
      1. 13.2.1 map_int
      2. 13.2.2 map_dbl
      3. 13.2.3 map_chr
      4. 13.2.4 map_lgl
      5. 13.2.5 map_df
      6. 13.2.6 map_if
    3. 13.3 Iterating over a data.frame
    4. 13.4 map with Multiple Inputs
    5. 13.5 Conclusion
  23. 14. Data Reshaping
    1. 14.1 cbind and rbind
    2. 14.2 Joins
      1. 14.2.1 merge
      2. 14.2.2 plyr join
      3. 14.2.3 data.table merge
    3. 14.3 reshape2
      1. 14.3.1 melt
      2. 14.3.2 dcast
    4. 14.4 Conclusion
  24. 15. Reshaping Data in the Tidyverse
    1. 15.1 Binding Rows and Columns
    2. 15.2 Joins with dplyr
    3. 15.3 Converting Data Formats
    4. 15.4 Conclusion
  25. 16. Manipulating Strings
    1. 16.1 paste
    2. 16.2 sprintf
    3. 16.3 Extracting Text
    4. 16.4 Regular Expressions
    5. 16.5 Conclusion
  26. 17. Probability Distributions
    1. 17.1 Normal Distribution
    2. 17.2 Binomial Distribution
    3. 17.3 Poisson Distribution
    4. 17.4 Other Distributions
    5. 17.5 Conclusion
  27. 18. Basic Statistics
    1. 18.1 Summary Statistics
    2. 18.2 Correlation and Covariance
    3. 18.3 T-Tests
      1. 18.3.1 One-Sample T-Test
      2. 18.3.2 Two-Sample T-Test
      3. 18.3.3 Paired Two-Sample T-Test
    4. 18.4 ANOVA
    5. 18.5 Conclusion
  28. 19. Linear Models
    1. 19.1 Simple Linear Regression
      1. 19.1.1 ANOVA Alternative
    2. 19.2 Multiple Regression
    3. 19.3 Conclusion
  29. 20. Generalized Linear Models
    1. 20.1 Logistic Regression
    2. 20.2 Poisson Regression
    3. 20.3 Other Generalized Linear Models
    4. 20.4 Survival Analysis
    5. 20.5 Conclusion
  30. 21. Model Diagnostics
    1. 21.1 Residuals
    2. 21.2 Comparing Models
    3. 21.3 Cross-Validation
    4. 21.4 Bootstrap
    5. 21.5 Stepwise Variable Selection
    6. 21.6 Conclusion
  31. 22. Regularization and Shrinkage
    1. 22.1 Elastic Net
    2. 22.2 Bayesian Shrinkage
    3. 22.3 Conclusion
  32. 23. Nonlinear Models
    1. 23.1 Nonlinear Least Squares
    2. 23.2 Splines
    3. 23.3 Generalized Additive Models
    4. 23.4 Decision Trees
    5. 23.5 Boosted Trees
    6. 23.6 Random Forests
    7. 23.7 Conclusion
  33. 24. Time Series and Autocorrelation
    1. 24.1 Autoregressive Moving Average
    2. 24.2 VAR
    3. 24.3 GARCH
    4. 24.4 Conclusion
  34. 25. Clustering
    1. 25.1 K-means
    2. 25.2 PAM
    3. 25.3 Hierarchical Clustering
    4. 25.4 Conclusion
  35. 26. Model Fitting with Caret
    1. 26.1 Caret Basics
    2. 26.2 Caret Options
      1. 26.2.1 caret Training Controls
      2. 26.2.2 Caret Search Grid
    3. 26.3 Tuning a Boosted Tree
    4. 26.4 Conclusion
  36. 27. Reproducibility and Reports with knitr
    1. 27.1 Installing a LATEX Program
    2. 27.2 LATEX Primer
    3. 27.3 Using knitr with LATEX
    4. 27.4 Conclusion
  37. 28. Rich Documents with RMarkdown
    1. 28.1 Document Compilation
    2. 28.2 Document Header
    3. 28.3 Markdown Primer
    4. 28.4 Markdown Code Chunks
    5. 28.5 htmlwidgets
      1. 28.5.1 datatables
      2. 28.5.2 leaflet
      3. 28.5.3 dygraphs
      4. 28.5.4 threejs
      5. 28.5.5 d3heatmap
    6. 28.6 RMarkdown Slideshows
    7. 28.7 Conclusion
  38. 29. Interactive Dashboards with Shiny
    1. 29.1 Shiny in RMarkdown
    2. 29.2 Reactive Expressions in Shiny
    3. 29.3 Server and UI
    4. 29.4 Conclusion
  39. 30. Building R Packages
    1. 30.1 Folder Structure
    2. 30.2 Package Files
      1. 30.2.1 DESCRIPTION File
      2. 30.2.2 NAMESPACE File
      3. 30.2.3 Other Package Files
    3. 30.3 Package Documentation
    4. 30.4 Tests
    5. 30.5 Checking, Building and Installing
    6. 30.6 Submitting to CRAN
    7. 30.7 C++ Code
      1. 30.7.1 sourceCpp
      2. 30.7.2 Compiling Packages
    8. 30.8 Conclusion
  40. A. Real-Life Resources
    1. A.1 Meetups
    2. A.2 Stack Overflow
    3. A.3 Twitter
    4. A.4 Conferences
    5. A.5 Web Sites
    6. A.6 Documents
    7. A.7 Books
    8. A.8 Conclusion
  41. B. Glossary
  42. List of Figures
  43. List of Tables
  44. General Index
  45. Index of Functions
  46. Index of Packages
  47. Index of People
  48. Data Index
  49. Code Snippets