O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

R Cookbook, 2nd Edition

Book Description

Perform data analysis with R quickly and efficiently with more than 275 practical recipes in this expanded second edition. The R language provides everything you need to do statistical work, but its structure can be difficult to master. These task-oriented recipes make you productive with R immediately. Solutions range from basic tasks to input and output, general statistics, graphics, and linear regression.

Each recipe addresses a specific problem and includes a discussion that explains the solution and provides insight into how it works. If you’re a beginner, R Cookbook will help get you started. If you’re an intermediate user, this book will jog your memory and expand your horizons. You’ll get the job done faster and learn more about R in the process.

  • Create vectors, handle variables, and perform basic functions
  • Simplify data input and output
  • Tackle data structures such as matrices, lists, factors, and data frames
  • Work with probability, probability distributions, and random variables
  • Calculate statistics and confidence intervals and perform statistical tests
  • Create a variety of graphic displays
  • Build statistical models with linear regressions and analysis of variance (ANOVA)
  • Explore advanced statistical techniques, such as finding clusters in your data

Table of Contents

  1. Welcome to the R Cookbook, 2nd Edition
    1. The Recipes
    2. A Note on Terminology
    3. Software and Platform Notes
    4. Other Resources
    5. Conventions Used in This Book
    6. Using Code Examples
    7. O’Reilly Online Learning
    8. How to Contact Us
    9. Acknowledgments
  2. 1. Getting Started and Getting Help
    1. 1.1. Downloading and Installing R
    2. 1.2. Installing RStudio
    3. 1.3. Starting RStudio
    4. 1.4. Entering Commands
    5. 1.5. Exiting from RStudio
    6. 1.6. Interrupting R
    7. 1.7. Viewing the Supplied Documentation
    8. 1.8. Getting Help on a Function
    9. 1.9. Searching the Supplied Documentation
    10. 1.10. Getting Help on a Package
    11. 1.11. Searching the Web for Help
    12. 1.12. Finding Relevant Functions and Packages
    13. 1.13. Searching the Mailing Lists
    14. 1.14. Submitting Questions to Stack Overflow or Elsewhere in the Community
  3. 2. Some Basics
    1. 2.1. Printing Something to the Screen
    2. 2.2. Setting Variables
    3. 2.3. Listing Variables
    4. 2.4. Deleting Variables
    5. 2.5. Creating a Vector
    6. 2.6. Computing Basic Statistics
    7. 2.7. Creating Sequences
    8. 2.8. Comparing Vectors
    9. 2.9. Selecting Vector Elements
    10. 2.10. Performing Vector Arithmetic
    11. 2.11. Getting Operator Precedence Right
    12. 2.12. Typing Less and Accomplishing More
    13. 2.13. Creating a Pipeline of Function Calls
    14. 2.14. Avoiding Some Common Mistakes
  4. 3. Navigating the Software
    1. 3.1. Getting and Setting the Working Directory
    2. 3.2. Creating a New RStudio Project
    3. 3.3. Saving Your Workspace
    4. 3.4. Viewing Your Command History
    5. 3.5. Saving the Result of the Previous Command
    6. 3.6. Displaying Loaded Packages via the Search Path
    7. 3.7. Viewing the List of Installed Packages
    8. 3.8. Accessing the Functions in a Package
    9. 3.9. Accessing Built-in Datasets
    10. 3.10. Installing Packages from CRAN
    11. 3.11. Installing a Package from GitHub
    12. 3.12. Setting or Changing a Default CRAN Mirror
    13. 3.13. Running a Script
    14. 3.14. Running a Batch Script
    15. 3.15. Locating the R Home Directory
    16. 3.16. Customizing R Startup
    17. 3.17. Using R and RStudio in the Cloud
  5. 4. Input and Output
    1. 4.1. Entering Data from the Keyboard
    2. 4.2. Printing Fewer Digits (or More Digits)
    3. 4.3. Redirecting Output to a File
    4. 4.4. Listing Files
    5. 4.5. Dealing with “Cannot Open File” in Windows
    6. 4.6. Reading Fixed-Width Records
    7. 4.7. Reading Tabular Data Files
    8. 4.8. Reading from CSV Files
    9. 4.9. Writing to CSV Files
    10. 4.10. Reading Tabular or CSV Data from the Web
    11. 4.11. Reading Data from Excel
    12. 4.12. Writing a Data Frame to Excel
    13. 4.13. Reading Data from a SAS File
    14. 4.14. Reading Data from HTML Tables
    15. 4.15. Reading Files with a Complex Structure
    16. 4.16. Reading from MySQL Databases
    17. 4.17. Accessing a Database with dbplyr
    18. 4.18. Saving and Transporting Objects
  6. 5. Data Structures
    1. 5.1. Appending Data to a Vector
    2. 5.2. Inserting Data into a Vector
    3. 5.3. Understanding the Recycling Rule
    4. 5.4. Creating a Factor (Categorical Variable)
    5. 5.5. Combining Multiple Vectors into One Vector and a Factor
    6. 5.6. Creating a List
    7. 5.7. Selecting List Elements by Position
    8. 5.8. Selecting List Elements by Name
    9. 5.9. Building a Name/Value Association List
    10. 5.10. Removing an Element from a List
    11. 5.11. Flattening a List into a Vector
    12. 5.12. Removing NULL Elements from a List
    13. 5.13. Removing List Elements Using a Condition
    14. 5.14. Initializing a Matrix
    15. 5.15. Performing Matrix Operations
    16. 5.16. Giving Descriptive Names to the Rows and Columns of a Matrix
    17. 5.17. Selecting One Row or Column from a Matrix
    18. 5.18. Initializing a Data Frame from Column Data
    19. 5.19. Initializing a Data Frame from Row Data
    20. 5.20. Appending Rows to a Data Frame
    21. 5.21. Selecting Data Frame Columns by Position
    22. 5.22. Selecting Data Frame Columns by Name
    23. 5.23. Changing the Names of Data Frame Columns
    24. 5.24. Removing NAs from a Data Frame
    25. 5.25. Excluding Columns by Name
    26. 5.26. Combining Two Data Frames
    27. 5.27. Merging Data Frames by Common Column
    28. 5.28. Converting One Atomic Value into Another
    29. 5.29. Converting One Structured Data Type into Another
  7. 6. Data Transformations
    1. 6.1. Applying a Function to Each List Element
    2. 6.2. Applying a Function to Every Row of a Data Frame
    3. 6.3. Applying a Function to Every Row of a Matrix
    4. 6.4. Applying a Function to Every Column
    5. 6.5. Applying a Function to Parallel Vectors or Lists
    6. 6.6. Applying a Function to Groups of Data
    7. 6.7. Creating a New Column Based on Some Condition
  8. 7. Strings and Dates
    1. 7.1. Getting the Length of a String
    2. 7.2. Concatenating Strings
    3. 7.3. Extracting Substrings
    4. 7.4. Splitting a String According to a Delimiter
    5. 7.5. Replacing Substrings
    6. 7.6. Generating All Pairwise Combinations of Strings
    7. 7.7. Getting the Current Date
    8. 7.8. Converting a String into a Date
    9. 7.9. Converting a Date into a String
    10. 7.10. Converting Year, Month, and Day into a Date
    11. 7.11. Getting the Julian Date
    12. 7.12. Extracting the Parts of a Date
    13. 7.13. Creating a Sequence of Dates
  9. 8. Probability
    1. 8.1. Counting the Number of Combinations
    2. 8.2. Generating Combinations
    3. 8.3. Generating Random Numbers
    4. 8.4. Generating Reproducible Random Numbers
    5. 8.5. Generating a Random Sample
    6. 8.6. Generating Random Sequences
    7. 8.7. Randomly Permuting a Vector
    8. 8.8. Calculating Probabilities for Discrete Distributions
    9. 8.9. Calculating Probabilities for Continuous Distributions
    10. 8.10. Converting Probabilities to Quantiles
    11. 8.11. Plotting a Density Function
  10. 9. General Statistics
    1. 9.1. Summarizing Your Data
    2. 9.2. Calculating Relative Frequencies
    3. 9.3. Tabulating Factors and Creating Contingency Tables
    4. 9.4. Testing Categorical Variables for Independence
    5. 9.5. Calculating Quantiles (and Quartiles) of a Dataset
    6. 9.6. Inverting a Quantile
    7. 9.7. Converting Data to z-Scores
    8. 9.8. Testing the Mean of a Sample (t-Test)
    9. 9.9. Forming a Confidence Interval for a Mean
    10. 9.10. Forming a Confidence Interval for a Median
    11. 9.11. Testing a Sample Proportion
    12. 9.12. Forming a Confidence Interval for a Proportion
    13. 9.13. Testing for Normality
    14. 9.14. Testing for Runs
    15. 9.15. Comparing the Means of Two Samples
    16. 9.16. Comparing the Locations of Two Samples Nonparametrically
    17. 9.17. Testing a Correlation for Significance
    18. 9.18. Testing Groups for Equal Proportions
    19. 9.19. Performing Pairwise Comparisons Between Group Means
    20. 9.20. Testing Two Samples for the Same Distribution
  11. 10. Graphics
    1. 10.1. Creating a Scatter Plot
    2. 10.2. Adding a Title and Labels
    3. 10.3. Adding (or Removing) a Grid
    4. 10.4. Applying a Theme to a ggplot Figure
    5. 10.5. Creating a Scatter Plot of Multiple Groups
    6. 10.6. Adding (or Removing) a Legend
    7. 10.7. Plotting the Regression Line of a Scatter Plot
    8. 10.8. Plotting All Variables Against All Other Variables
    9. 10.9. Creating One Scatter Plot for Each Group
    10. 10.10. Creating a Bar Chart
    11. 10.11. Adding Confidence Intervals to a Bar Chart
    12. 10.12. Coloring a Bar Chart
    13. 10.13. Plotting a Line from x and y Points
    14. 10.14. Changing the Type, Width, or Color of a Line
    15. 10.15. Plotting Multiple Datasets
    16. 10.16. Adding Vertical or Horizontal Lines
    17. 10.17. Creating a Boxplot
    18. 10.18. Creating One Boxplot for Each Factor Level
    19. 10.19. Creating a Histogram
    20. 10.20. Adding a Density Estimate to a Histogram
    21. 10.21. Creating a Normal Quantile–Quantile Plot
    22. 10.22. Creating Other Quantile–Quantile Plots
    23. 10.23. Plotting a Variable in Multiple Colors
    24. 10.24. Graphing a Function
    25. 10.25. Displaying Several Figures on One Page
    26. 10.26. Writing Your Plot to a File
  12. 11. Linear Regression and ANOVA
    1. 11.1. Performing Simple Linear Regression
    2. 11.2. Performing Multiple Linear Regression
    3. 11.3. Getting Regression Statistics
    4. 11.4. Understanding the Regression Summary
    5. 11.5. Performing Linear Regression Without an Intercept
    6. 11.6. Regressing Only Variables That Highly Correlate with Your Dependent Variable
    7. 11.7. Performing Linear Regression with Interaction Terms
    8. 11.8. Selecting the Best Regression Variables
    9. 11.9. Regressing on a Subset of Your Data
    10. 11.10. Using an Expression Inside a Regression Formula
    11. 11.11. Regressing on a Polynomial
    12. 11.12. Regressing on Transformed Data
    13. 11.13. Finding the Best Power Transformation (Box–Cox Procedure)
    14. 11.14. Forming Confidence Intervals for Regression Coefficients
    15. 11.15. Plotting Regression Residuals
    16. 11.16. Diagnosing a Linear Regression
    17. 11.17. Identifying Influential Observations
    18. 11.18. Testing Residuals for Autocorrelation (Durbin–Watson Test)
    19. 11.19. Predicting New Values
    20. 11.20. Forming Prediction Intervals
    21. 11.21. Performing One-Way ANOVA
    22. 11.22. Creating an Interaction Plot
    23. 11.23. Finding Differences Between Means of Groups
    24. 11.24. Performing Robust ANOVA (Kruskal–Wallis Test)
    25. 11.25. Comparing Models by Using ANOVA
  13. 12. Useful Tricks
    1. 12.1. Peeking at Your Data
    2. 12.2. Printing the Result of an Assignment
    3. 12.3. Summing Rows and Columns
    4. 12.4. Printing Data in Columns
    5. 12.5. Binning Your Data
    6. 12.6. Finding the Position of a Particular Value
    7. 12.7. Selecting Every nth Element of a Vector
    8. 12.8. Finding Minimums or Maximums
    9. 12.9. Generating All Combinations of Several Variables
    10. 12.10. Flattening a Data Frame
    11. 12.11. Sorting a Data Frame
    12. 12.12. Stripping Attributes from a Variable
    13. 12.13. Revealing the Structure of an Object
    14. 12.14. Timing Your Code
    15. 12.15. Suppressing Warnings and Error Messages
    16. 12.16. Taking Function Arguments from a List
    17. 12.17. Defining Your Own Binary Operators
    18. 12.18. Suppressing the Startup Message
    19. 12.19. Getting and Setting Environment Variables
    20. 12.20. Use Code Sections
    21. 12.21. Executing R in Parallel Locally
    22. 12.22. Executing R in Parallel Remotely
  14. 13. Beyond Basic Numerics and Statistics
    1. 13.1. Minimizing or Maximizing a Single-Parameter Function
    2. 13.2. Minimizing or Maximizing a Multiparameter Function
    3. 13.3. Calculating Eigenvalues and Eigenvectors
    4. 13.4. Performing Principal Component Analysis
    5. 13.5. Performing Simple Orthogonal Regression
    6. 13.6. Finding Clusters in Your Data
    7. 13.7. Predicting a Binary-Valued Variable (Logistic Regression)
    8. 13.8. Bootstrapping a Statistic
    9. 13.9. Factor Analysis
  15. 14. Time Series Analysis
    1. 14.1. Representing Time Series Data
    2. 14.2. Plotting Time Series Data
    3. 14.3. Extracting the Oldest or Newest Observations
    4. 14.4. Subsetting a Time Series
    5. 14.5. Merging Several Time Series
    6. 14.6. Filling or Padding a Time Series
    7. 14.7. Lagging a Time Series
    8. 14.8. Computing Successive Differences
    9. 14.9. Performing Calculations on Time Series
    10. 14.10. Computing a Moving Average
    11. 14.11. Applying a Function by Calendar Period
    12. 14.12. Applying a Rolling Function
    13. 14.13. Plotting the Autocorrelation Function
    14. 14.14. Testing a Time Series for Autocorrelation
    15. 14.15. Plotting the Partial Autocorrelation Function
    16. 14.16. Finding Lagged Correlations Between Two Time Series
    17. 14.17. Detrending a Time Series
    18. 14.18. Fitting an ARIMA Model
    19. 14.19. Removing Insignificant ARIMA Coefficients
    20. 14.20. Running Diagnostics on an ARIMA Model
    21. 14.21. Making Forecasts from an ARIMA Model
    22. 14.22. Plotting a Forecast
    23. 14.23. Testing for Mean Reversion
    24. 14.24. Smoothing a Time Series
  16. 15. Simple Programming
    1. 15.1. Choosing Between Two Alternatives: if/else
    2. 15.2. Iterating with a Loop
    3. 15.3. Defining a Function
    4. 15.4. Creating a Local Variable
    5. 15.5. Choosing Between Multiple Alternatives: switch
    6. 15.6. Defining Defaults for Function Parameters
    7. 15.7. Signaling Errors
    8. 15.8. Protecting Against Errors
    9. 15.9. Creating an Anonymous Function
    10. 15.10. Creating a Collection of Reusable Functions
    11. 15.11. Automatically Reindenting Code
  17. 16. R Markdown and Publishing
    1. 16.1. Creating a New Document
    2. 16.2. Adding a Title, Author, or Date
    3. 16.3. Formatting Document Text
    4. 16.4. Inserting Document Headings
    5. 16.5. Inserting a List
    6. 16.6. Showing Output from R Code
    7. 16.7. Controlling Which Code and Results Are Shown
    8. 16.8. Inserting a Plot
    9. 16.9. Inserting a Table
    10. 16.10. Inserting a Table of Data
    11. 16.11. Inserting Math Equations
    12. 16.12. Generating HTML Output
    13. 16.13. Generating PDF Output
    14. 16.14. Generating Microsoft Word Output
    15. 16.15. Generating Presentation Output
    16. 16.16. Creating a Parameterized Report
    17. 16.17. Organizing Your R Markdown Workflow
  18. Index