O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

R Programming By Example

Book Description

This step-by-step guide demonstrates how to build simple-to-advanced applications through examples in R using modern tools.

About This Book

  • Get a firm hold on the fundamentals of R through practical hands-on examples
  • Get started with good R programming fundamentals for data science
  • Exploit the different libraries of R to build interesting applications in R

Who This Book Is For

This books is for aspiring data science professionals or statisticians who would like to learn about the R programming language in a practical manner. Basic programming knowledge is assumed.

What You Will Learn

  • Discover techniques to leverage R's features, and work with packages
  • Perform a descriptive analysis and work with statistical models using R
  • Work efficiently with objects without using loops
  • Create diverse visualizations to gain better understanding of the data
  • Understand ways to produce good visualizations and create reports for the results
  • Read and write data from relational databases and REST APIs, both packaged and unpackaged
  • Improve performance by writing better code, delegating that code to a more efficient programming language, or making it parallel

In Detail

R is a high-level statistical language and is widely used among statisticians and data miners to develop analytical applications. Often, data analysis people with great analytical skills lack solid programming knowledge and are unfamiliar with the correct ways to use R. Based on the version 3.4, this book will help you develop strong fundamentals when working with R by taking you through a series of full representative examples, giving you a holistic view of R.

We begin with the basic installation and configuration of the R environment. As you progress through the exercises, you'll become thoroughly acquainted with R's features and its packages. With this book, you will learn about the basic concepts of R programming, work efficiently with graphs, create publication-ready and interactive 3D graphs, and gain a better understanding of the data at hand. The detailed step-by-step instructions will enable you to get a clean set of data, produce good visualizations, and create reports for the results. It also teaches you various methods to perform code profiling and performance enhancement with good programming practices, delegation, and parallelization.

By the end of this book, you will know how to efficiently work with data, create quality visualizations and reports, and develop code that is modular, expressive, and maintainable.

Style and Approach

This is an easy-to-understand guide filled with real-world examples, giving you a holistic view of R and practical, hands-on experience.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Downloading the color images of this book
      3. Errata
      4. Piracy
      5. Questions
  2. Introduction to R
    1. What R is and what it isn't
      1. The inspiration for R – the S language
      2. R is a high quality statistical computing system
      3. R is a flexible programming language
      4. R is free, as in freedom and as in free beer
      5. What R is not good for
    2. Comparing R with other software
    3. The interpreter and the console
    4. Tools to work efficiently with R
      1. Pick an IDE or a powerful editor
      2. The send to console functionality
      3. The efficient write-execute loop
      4. Executing R code in non-interactive sessions
    5. How to use this book
    6. Tracking state with symbols and variables
    7. Working with data types and data structures
      1. Numerics
        1. Special values
      2. Characters
      3. Logicals
      4. Vectors
      5. Factors
      6. Matrices
      7. Lists
      8. Data frames
    8. Divide and conquer with functions
      1. Optional arguments
      2. Functions as arguments
      3. Operators are functions
      4. Coercion
    9. Complex logic with control structures
      1. If… else conditionals
      2. For loops
      3. While loops
    10. The examples in this book
    11. Summary
  3. Understanding Votes with Descriptive Statistics
    1. This chapter's required packages
    2. The Brexit votes example
    3. Cleaning and setting up the data
    4. Summarizing the data into a data frame
    5. Getting intuition with graphs and correlations
      1. Visualizing variable distributions
      2. Using matrix scatter plots for a quick overview
      3. Getting a better look with detailed scatter plots
      4. Understanding interactions with correlations
    6. Creating a new dataset with what we've learned
    7. Building new variables with principal components
    8. Putting it all together into high-quality code
      1. Planning before programming
      2. Understanding the fundamentals of high-quality code
      3. Programming by visualizing the big picture
    9. Summary
  4. Predicting Votes with Linear Models
    1. Required packages
    2. Setting up the data
      1. Training and testing datasets
    3. Predicting votes with linear models
    4. Checking model assumptions
      1. Checking linearity with scatter plots
      2. Checking normality with histograms and quantile-quantile plots
      3. Checking homoscedasticity with residual plots
      4. Checking no collinearity with correlations
    5. Measuring accuracy with score functions
    6. Programatically finding the best model
      1. Generating model combinations
    7. Predicting votes from wards with unknown data
    8. Summary
  5. Simulating Sales Data and Working with Databases
    1. Required packages
    2. Designing our data tables
      1. The basic variables
      2. Simplifying assumptions
      3. Potential pitfalls
        1. The too-much-empty-space problem
        2. The too-much-repeated-data problem
    3. Simulating the sales data
      1. Simulating numeric data according to distribution assumptions
      2. Simulating categorical values using factors
      3. Simulating dates within a range
      4. Simulating numbers under shared restrictions
      5. Simulating strings for complex identifiers
      6. Putting everything together
    4. Simulating the client data
    5. Simulating the client messages data
    6. Working with relational databases
    7. Summary
  6. Communicating Sales with Visualizations
    1. Required packages
    2. Extending our data with profit metrics
    3. Building blocks for reusable high-quality graphs
    4. Starting with simple applications for bar graphs
      1. Adding a third dimension with colors
      2. Graphing top performers with bar graphs
    5. Graphing disaggregated data with boxplots
    6. Scatter plots with joint and marginal distributions
      1. Pricing and profitability by protein source and continent
      2. Client birth dates, gender, and ratings
    7. Developing our own graph type – radar graphs
    8. Exploring with interactive 3D scatter plots
    9. Looking at dynamic data with time-series
    10. Looking at geographical data with static maps
    11. Navigating geographical data with interactive maps
      1. Maps you can navigate and zoom-in to
      2. High-tech-looking interactive globe
    12. Summary
  7. Understanding Reviews with Text Analysis
    1. This chapter's required packages
    2. What is text analysis and how does it work?
    3. Preparing, training, and testing data
    4. Building the corpus with tokenization and data cleaning
      1. Document feature matrices
    5. Training models with cross validation
      1. Training our first predictive model
      2. Improving speed with parallelization
      3. Computing predictive accuracy and confusion matrices
    6. Improving our results with TF-IDF
    7. Adding flexibility with N-grams
    8. Reducing dimensionality with SVD
    9. Extending our analysis with cosine similarity
    10. Digging deeper with sentiment analysis
    11. Testing our predictive model with unseen data
    12. Retrieving text data from Twitter
    13. Summary
  8. Developing Automatic Presentations
    1. Required packages
    2. Why invest in automation?
    3. Literate programming as a content creation methodology
      1. Reproducibility as a benefit of literate programming
    4. The basic tools for an automation pipeline
    5. A gentle introduction to Markdown
      1. Text
      2. Headers
    6. Header Level  1
      1. Header Level  2
        1. Header Level  3
          1. Header Level  4
      2. Lists
      3. Tables
      4. Links
      5. Images
      6. Quotes
      7. Code
      8. Mathematics
    7. Extending Markdown with R Markdown
      1. Code chunks
      2. Tables
      3. Graphs
      4. Chunk options
      5. Global chunk options
      6. Caching
      7. Producing the final output with knitr
    8. Developing graphs and analysis as we normally would
    9. Building our presentation with R Markdown
    10. Summary
  9. Object-Oriented System to Track Cryptocurrencies
    1. This chapter's required packages
    2. The cryptocurrencies example
    3. A brief introduction to object-oriented programming
      1. The purpose of object-oriented programming
      2. Important concepts behind object-oriented languages
        1. Encapsulation
        2. Polymorphism
        3. Hierarchies
        4. Classes and constructors
        5. Public and private methods
        6. Interfaces, factories, and patterns in general
    4. Introducing three object models in R – S3, S4, and R6
      1. The first source of confusion – various object models
      2. The second source of confusion – generic functions
      3. The S3 object model
        1. Classes, constructors, and composition
        2. Public methods and polymorphism
        3. Encapsulation and mutability
        4. Inheritance
      4. The S4 object model
        1. Classes, constructors, and composition
        2. Public methods and polymorphism
        3. Encapsulation and mutability
        4. Inheritance
      5. The R6 object model
        1. Classes, constructors, and composition
        2. Public methods and polymorphism
        3. Encapsulation and mutability
        4. Inheritance
        5. Active bindings
        6. Finalizers
    5. The architecture behind our cryptocurrencies system
    6. Starting simple with timestamps using S3 classes
    7. Implementing cryptocurrency assets using S4 classes
    8. Implementing our storage layer with R6 classes
      1. Communicating available behavior with a database interface
      2. Implementing a database-like storage system with CSV files
      3. Easily allowing new database integration with a factory
      4. Encapsulating multiple databases with a storage layer
    9. Retrieving live data for markets and wallets with R6 classes
      1. Creating a very simple requester to isolate API calls
      2. Developing our exchanges infrastructure
      3. Developing our wallets infrastructure
      4. Implementing our wallet requesters
    10. Finally introducing users with S3 classes
    11. Helping ourselves with a centralized settings file
    12. Saving our initial user data into the system
    13. Activating our system with two simple functions
    14. Some advice when working with object-oriented systems
    15. Summary
  10. Implementing an Efficient Simple Moving Average
    1. Required packages
    2. Starting by using good algorithms
      1. Just how much impact can algorithm selection have?
    3. How fast is fast enough?
    4. Calculating simple moving averages inefficiently
      1. Simulating the time-series 
      2. Our first (very inefficient) attempt at an SMA
    5. Understanding why R can be slow
      1. Object immutability
      2. Interpreted dynamic typings
      3. Memory-bound processes
      4. Single-threaded processes
    6. Measuring by profiling and benchmarking
      1. Profiling fundamentals with Rprof()
      2. Benchmarking manually with system.time()
      3. Benchmarking automatically with microbenchmark()
    7. Easily achieving high benefit - cost improvements
      1. Using the simple data structure for the job
      2. Vectorizing as much as possible
      3. Removing unnecessary logic
      4. Moving checks out of iterative processes
      5. If you can, avoid iterating at all
      6. Using R's way of iterating efficiently
      7. Avoiding sending data structures with overheads
    8. Using parallelization to divide and conquer
      1. How deep does the parallelization rabbit hole go?
      2. Practical parallelization with R
    9. Using C++ and Fortran to accelerate calculations
      1. Using an old-school approach with Fortran
      2. Using a modern approach with C++
    10. Looking back at what we have achieved
    11. Other topics of interest to enhance performance
      1. Preallocating memory to avoid duplication
      2. Making R code a bit faster with byte code compilation
        1. Just-in-time (JIT) compilation of R code
      3. Using memoization or cache layers
      4. Improving our data and memory management
      5. Using specialized packages for performance
      6. Flexibility and power with cloud computing
      7. Specialized R distributions
    12. Summary
  11. Adding Interactivity with Dashboards
    1. Required packages
      1. Introducing the Shiny application architecture and reactivity
    2. What is functional reactive programming and why is it useful?
      1. How is functional reactivity handled within Shiny?
      2. The building blocks for reactivity in Shiny
      3. The input, output, and rendering functions
    3. Designing our high-level application structure
      1. Setting up a two-column distribution
      2. Introducing sections with panels
    4. Inserting a dynamic data table
    5. Introducing interactivity with user input
      1. Setting up static user inputs
      2. Setting up dynamic options in a drop-down
      3. Setting up dynamic input panels
    6. Adding a summary table with shared data
    7. Adding a simple moving average graph
    8. Adding interactivity with a secondary zoom-in graph
    9. Styling our application with themes
    10. Other topics of interest
      1. Adding static images
      2. Adding HTML to your web application
      3. Adding custom CSS styling
      4. Sharing your newly created application
    11. Summary
  12. Required Packages
    1. External requirements – software outside of R
      1. Dependencies for the RMySQL R package
        1. Ubuntu 17.10
      2. macOS High Sierra
        1. Setting up user/password in both Linux and macOS
      3. Dependencies for the rgl and rgdal R packages
        1. Ubuntu 17.10
        2. macOS High Sierra
      4. Dependencies for the Rcpp package and the .Fortran() function
        1. Ubuntu 17.10
        2. macOS High Sierra
    2. Internal requirements – R packages
    3. Loading R packages