O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Machine Learning with R Cookbook - Second Edition

Book Description

Explore over 110 recipes to analyze data and build predictive models with simple and easy-to-use R code

About This Book

  • Apply R to simplify predictive modeling with short and simple code
  • Use machine learning to solve problems ranging from small to big data
  • Build a training and testing dataset, applying different classification methods.

Who This Book Is For

This book is for data science professionals, data analysts, or people who have used R for data analysis and machine learning who now wish to become the go-to person for machine learning with R. Those who wish to improve the efficiency of their machine learning models and need to work with different kinds of data set will find this book very insightful.

What You Will Learn

  • Create and inspect transaction datasets and perform association analysis with the Apriori algorithm
  • Visualize patterns and associations using a range of graphs and find frequent item-sets using the Eclat algorithm
  • Compare differences between each regression method to discover how they solve problems
  • Detect and impute missing values in air quality data
  • Predict possible churn users with the classification approach
  • Plot the autocorrelation function with time series analysis
  • Use the Cox proportional hazards model for survival analysis
  • Implement the clustering method to segment customer data
  • Compress images with the dimension reduction method
  • Incorporate R and Hadoop to solve machine learning problems on big data

In Detail

Big data has become a popular buzzword across many industries. An increasing number of people have been exposed to the term and are looking at how to leverage big data in their own businesses, to improve sales and profitability. However, collecting, aggregating, and visualizing data is just one part of the equation. Being able to extract useful information from data is another task, and a much more challenging one. Machine Learning with R Cookbook, Second Edition uses a practical approach to teach you how to perform machine learning with R. Each chapter is divided into several simple recipes. Through the step-by-step instructions provided in each recipe, you will be able to construct a predictive model by using a variety of machine learning packages. In this book, you will first learn to set up the R environment and use simple R commands to explore data. The next topic covers how to perform statistical analysis with machine learning analysis and assess created models, covered in detail later on in the book. You'll also learn how to integrate R and Hadoop to create a big data analysis platform. The detailed illustrations provide all the information required to start applying machine learning to individual projects. With Machine Learning with R Cookbook, machine learning has never been easier.

Style and approach

This is an easy-to-follow guide packed with hands-on examples of machine learning tasks. Each topic includes step-by-step instructions on tackling difficulties faced when applying R to machine learning.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Sections
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
      5. See also
    5. Conventions
    6. Reader feedback
    7. Customer support
      1. Downloading the example code
      2. Errata
      3. Piracy
      4. Questions
  2. Practical Machine Learning with R
    1. Introduction
    2. Downloading and installing R
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    3. Downloading and installing RStudio
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    4. Installing and loading packages
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    5. Understanding of basic data structures
      1. Data types
      2. Data structures
      3. Vectors
        1. How to do it...
        2. How it works...
      4. Lists
        1. How to do it...
        2. How it works...
      5. Array
        1. How to do it...
        2. How it works...
      6. Matrix
        1. How to do it...
      7. DataFrame
      8. How to do it...
    6. Basic commands for subsetting
      1. How to do it...
      2. Data input
    7. Reading and writing data
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    8. Manipulating data
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    9. Applying basic statistics
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    10. Visualizing data
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    11. Getting a dataset for machine learning
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  3. Data Exploration with Air Quality Datasets
    1. Introduction
    2. Using air quality dataset
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    3. Converting attributes to factor
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Detecting missing values
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Imputing missing values
      1. Getting ready
      2. How to do it...
      3. How it works...
    6. Exploring and visualizing data
      1. Getting ready
      2. How to do it...
    7. Predicting values from datasets
      1. Getting ready
      2. How to do it...
      3. How it works...
  4. Analyzing Time Series Data
    1. Introduction
    2. Looking at time series data
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    3. Plotting and forecasting time series data
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    4. Extracting, subsetting, merging, filling, and padding
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    5. Successive differences and moving averages
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    6. Exponential smoothing
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    7. Plotting the autocorrelation function
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  5. R and Statistics
    1. Introduction
    2. Understanding data sampling in R
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    3. Operating a probability distribution in R
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Working with univariate descriptive statistics in R
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Performing correlations and multivariate analysis
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    6. Conducting an exact binomial test
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    7. Performing a student's t-test
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    8. Performing the Kolmogorov-Smirnov test
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    9. Understanding the Wilcoxon Rank Sum and Signed Rank test
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    10. Working with Pearson's Chi-squared test
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    11. Conducting a one-way ANOVA
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    12. Performing a two-way ANOVA
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  6. Understanding Regression Analysis
    1. Introduction
    2. Different types of regression
    3. Fitting a linear regression model with lm
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Summarizing linear model fits
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    5. Using linear regression to predict unknown values
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    6. Generating a diagnostic plot of a fitted model
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    7. Fitting multiple regression
      1. Getting ready
      2. How to do it...
      3. How it works...
    8. Summarizing multiple regression
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    9. Using multiple regression to predict unknown values
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    10. Fitting a polynomial regression model with lm
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    11. Fitting a robust linear regression model with rlm
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    12. Studying a case of linear regression on SLID data
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    13. Applying the Gaussian model for generalized linear regression
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    14. Applying the Poisson model for generalized linear regression
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    15. Applying the Binomial model for generalized linear regression
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    16. Fitting a generalized additive model to data
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    17. Visualizing a generalized additive model
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    18. Diagnosing a generalized additive model
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
  7. Survival Analysis
    1. Introduction
    2. Loading and observing data
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    3. Viewing the summary of survival analysis
      1. Getting ready
      2. How to do it...
      3. How it works...
    4. Visualizing the Survival Curve
      1. Getting ready
      2. How to do it...
      3. How it works...
    5. Using the log-rank test
      1. Getting ready
      2. How to do it...
      3. How it works...
    6. Using the COX proportional hazard model
      1. Getting ready
      2. How to do it...
      3. How it works...
    7. Nelson-Aalen Estimator of cumulative hazard
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  8. Classification 1 - Tree, Lazy, and Probabilistic
    1. Introduction
    2. Preparing the training and testing datasets
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    3. Building a classification model with recursive partitioning trees
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    4. Visualizing a recursive partitioning tree
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    5. Measuring the prediction performance of a recursive partitioning tree
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    6. Pruning a recursive partitioning tree
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    7. Handling missing data and split and surrogate variables
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    8. Building a classification model with a conditional inference tree
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    9. Control parameters in conditional inference trees
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    10. Visualizing a conditional inference tree
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    11. Measuring the prediction performance of a conditional inference tree
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    12. Classifying data with the k-nearest neighbor classifier
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    13. Classifying data with logistic regression
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    14. Classifying data with the Naïve Bayes classifier
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  9. Classification 2 - Neural Network and SVM
    1. Introduction
    2. Classifying data with a support vector machine
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    3. Choosing the cost of a support vector machine
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    4. Visualizing an SVM fit
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    5. Predicting labels based on a model trained by a support vector machine
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Tuning a support vector machine
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    7. The basics of neural network
      1. Getting ready
      2. How to do it...
    8. Training a neural network with neuralnet
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    9. Visualizing a neural network trained by neuralnet
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    10. Predicting labels based on a model trained by neuralnet
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    11. Training a neural network with nnet
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    12. Predicting labels based on a model trained by nnet
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  10. Model Evaluation
    1. Introduction
      1. Why do models need to be evaluated?
      2. Different methods of model evaluation
    2. Estimating model performance with k-fold cross-validation
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    3. Estimating model performance with Leave One Out Cross Validation
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    4. Performing cross-validation with the e1071 package
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    5. Performing cross-validation with the caret package
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    6. Ranking the variable importance with the caret package
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    7. Ranking the variable importance with the rminer package
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    8. Finding highly correlated features with the caret package
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    9. Selecting features using the caret package
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    10. Measuring the performance of the regression model
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    11. Measuring prediction performance with a confusion matrix
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    12. Measuring prediction performance using ROCR
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    13. Comparing an ROC curve using the caret package
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    14. Measuring performance differences between models with the caret package
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  11. Ensemble Learning
    1. Introduction
    2. Using the Super Learner algorithm
      1. Getting ready
      2. How to do it...
      3. How it works...
    3. Using ensemble to train and test
      1. Getting ready
      2. How to do it...
      3. How it works...
    4. Classifying data with the bagging method
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Performing cross-validation with the bagging method
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    6. Classifying data with the boosting method
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    7. Performing cross-validation with the boosting method
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    8. Classifying data with gradient boosting
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    9. Calculating the margins of a classifier
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    10. Calculating the error evolution of the ensemble method
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    11. Classifying data with random forest
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    12. Estimating the prediction errors of different classifiers
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  12. Clustering
    1. Introduction
    2. Clustering data with hierarchical clustering
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    3. Cutting trees into clusters
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Clustering data with the k-means method
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    5. Drawing a bivariate cluster plot
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Comparing clustering methods
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    7. Extracting silhouette information from clustering
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    8. Obtaining the optimum number of clusters for k-means
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    9. Clustering data with the density-based method
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    10. Clustering data with the model-based method
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    11. Visualizing a dissimilarity matrix
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    12. Validating clusters externally
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  13. Association Analysis and Sequence Mining
    1. Introduction
    2. Transforming data into transactions
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    3. Displaying transactions and associations
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    4. Mining associations with the Apriori rule
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    5. Pruning redundant rules
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    6. Visualizing association rules
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    7. Mining frequent itemsets with Eclat
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    8. Creating transactions with temporal information
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    9. Mining frequent sequential patterns with cSPADE
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    10. Using the TraMineR package for sequence analysis
      1. Getting ready
      2. How to do it...
      3. How it works...
    11. Visualizing sequence, Chronogram, and Traversal Statistics
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  14. Dimension Reduction
    1. Introduction
    2. Why to reduce the dimension?
    3. Performing feature selection with FSelector
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    4. Performing dimension reduction with PCA
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Determining the number of principal components using the scree test
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Determining the number of principal components using the Kaiser method
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    7. Visualizing multivariate data using biplot
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    8. Performing dimension reduction with MDS
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    9. Reducing dimensions with SVD
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    10. Compressing images with SVD
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    11. Performing nonlinear dimension reduction with ISOMAP
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    12. Performing nonlinear dimension reduction with Local Linear Embedding
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  15. Big Data Analysis (R and Hadoop)
    1. Introduction
    2. Preparing the RHadoop environment
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    3. Installing rmr2
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    4. Installing rhdfs
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    5. Operating HDFS with rhdfs
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    6. Implementing a word count problem with RHadoop
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    7. Comparing the performance between an R MapReduce program and a standard R program
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    8. Testing and debugging the rmr2 program
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    9. Installing plyrmr
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    10. Manipulating data with plyrmr
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    11. Conducting machine learning with RHadoop
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    12. Configuring RHadoop clusters on Amazon EMR
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also