Book description
Explore over 110 recipes to analyze data and build predictive models with simple and easytouse R code
About This Book
 Apply R to simplify predictive modeling with short and simple code
 Use machine learning to solve problems ranging from small to big data
 Build a training and testing dataset, applying different classification methods.
Who This Book Is For
This book is for data science professionals, data analysts, or people who have used R for data analysis and machine learning who now wish to become the goto person for machine learning with R. Those who wish to improve the efficiency of their machine learning models and need to work with different kinds of data set will find this book very insightful.
What You Will Learn
 Create and inspect transaction datasets and perform association analysis with the Apriori algorithm
 Visualize patterns and associations using a range of graphs and find frequent itemsets using the Eclat algorithm
 Compare differences between each regression method to discover how they solve problems
 Detect and impute missing values in air quality data
 Predict possible churn users with the classification approach
 Plot the autocorrelation function with time series analysis
 Use the Cox proportional hazards model for survival analysis
 Implement the clustering method to segment customer data
 Compress images with the dimension reduction method
 Incorporate R and Hadoop to solve machine learning problems on big data
In Detail
Big data has become a popular buzzword across many industries. An increasing number of people have been exposed to the term and are looking at how to leverage big data in their own businesses, to improve sales and profitability. However, collecting, aggregating, and visualizing data is just one part of the equation. Being able to extract useful information from data is another task, and a much more challenging one. Machine Learning with R Cookbook, Second Edition uses a practical approach to teach you how to perform machine learning with R. Each chapter is divided into several simple recipes. Through the stepbystep instructions provided in each recipe, you will be able to construct a predictive model by using a variety of machine learning packages. In this book, you will first learn to set up the R environment and use simple R commands to explore data. The next topic covers how to perform statistical analysis with machine learning analysis and assess created models, covered in detail later on in the book. You'll also learn how to integrate R and Hadoop to create a big data analysis platform. The detailed illustrations provide all the information required to start applying machine learning to individual projects. With Machine Learning with R Cookbook, machine learning has never been easier.
Style and approach
This is an easytofollow guide packed with handson examples of machine learning tasks. Each topic includes stepbystep instructions on tackling difficulties faced when applying R to machine learning.
Publisher resources
Table of contents
 Preface

Practical Machine Learning with R
 Introduction
 Downloading and installing R
 Downloading and installing RStudio
 Installing and loading packages
 Understanding of basic data structures
 Basic commands for subsetting
 Reading and writing data
 Manipulating data
 Applying basic statistics
 Visualizing data
 Getting a dataset for machine learning
 Data Exploration with Air Quality Datasets
 Analyzing Time Series Data

R and Statistics
 Introduction
 Understanding data sampling in R
 Operating a probability distribution in R
 Working with univariate descriptive statistics in R
 Performing correlations and multivariate analysis
 Conducting an exact binomial test
 Performing a student's ttest
 Performing the KolmogorovSmirnov test
 Understanding the Wilcoxon Rank Sum and Signed Rank test
 Working with Pearson's Chisquared test
 Conducting a oneway ANOVA
 Performing a twoway ANOVA

Understanding Regression Analysis
 Introduction
 Different types of regression
 Fitting a linear regression model with lm
 Summarizing linear model fits
 Using linear regression to predict unknown values
 Generating a diagnostic plot of a fitted model
 Fitting multiple regression
 Summarizing multiple regression
 Using multiple regression to predict unknown values
 Fitting a polynomial regression model with lm
 Fitting a robust linear regression model with rlm
 Studying a case of linear regression on SLID data
 Applying the Gaussian model for generalized linear regression
 Applying the Poisson model for generalized linear regression
 Applying the Binomial model for generalized linear regression
 Fitting a generalized additive model to data
 Visualizing a generalized additive model
 Diagnosing a generalized additive model
 Survival Analysis

Classification 1  Tree, Lazy, and Probabilistic
 Introduction
 Preparing the training and testing datasets
 Building a classification model with recursive partitioning trees
 Visualizing a recursive partitioning tree
 Measuring the prediction performance of a recursive partitioning tree
 Pruning a recursive partitioning tree
 Handling missing data and split and surrogate variables
 Building a classification model with a conditional inference tree
 Control parameters in conditional inference trees
 Visualizing a conditional inference tree
 Measuring the prediction performance of a conditional inference tree
 Classifying data with the knearest neighbor classifier
 Classifying data with logistic regression
 Classifying data with the Naïve Bayes classifier

Classification 2  Neural Network and SVM
 Introduction
 Classifying data with a support vector machine
 Choosing the cost of a support vector machine
 Visualizing an SVM fit
 Predicting labels based on a model trained by a support vector machine
 Tuning a support vector machine
 The basics of neural network
 Training a neural network with neuralnet
 Visualizing a neural network trained by neuralnet
 Predicting labels based on a model trained by neuralnet
 Training a neural network with nnet
 Predicting labels based on a model trained by nnet

Model Evaluation
 Introduction
 Estimating model performance with kfold crossvalidation
 Estimating model performance with Leave One Out Cross Validation
 Performing crossvalidation with the e1071 package
 Performing crossvalidation with the caret package
 Ranking the variable importance with the caret package
 Ranking the variable importance with the rminer package
 Finding highly correlated features with the caret package
 Selecting features using the caret package
 Measuring the performance of the regression model
 Measuring prediction performance with a confusion matrix
 Measuring prediction performance using ROCR
 Comparing an ROC curve using the caret package
 Measuring performance differences between models with the caret package

Ensemble Learning
 Introduction
 Using the Super Learner algorithm
 Using ensemble to train and test
 Classifying data with the bagging method
 Performing crossvalidation with the bagging method
 Classifying data with the boosting method
 Performing crossvalidation with the boosting method
 Classifying data with gradient boosting
 Calculating the margins of a classifier
 Calculating the error evolution of the ensemble method
 Classifying data with random forest
 Estimating the prediction errors of different classifiers

Clustering
 Introduction
 Clustering data with hierarchical clustering
 Cutting trees into clusters
 Clustering data with the kmeans method
 Drawing a bivariate cluster plot
 Comparing clustering methods
 Extracting silhouette information from clustering
 Obtaining the optimum number of clusters for kmeans
 Clustering data with the densitybased method
 Clustering data with the modelbased method
 Visualizing a dissimilarity matrix
 Validating clusters externally

Association Analysis and Sequence Mining
 Introduction
 Transforming data into transactions
 Displaying transactions and associations
 Mining associations with the Apriori rule
 Pruning redundant rules
 Visualizing association rules
 Mining frequent itemsets with Eclat
 Creating transactions with temporal information
 Mining frequent sequential patterns with cSPADE
 Using the TraMineR package for sequence analysis
 Visualizing sequence, Chronogram, and Traversal Statistics

Dimension Reduction
 Introduction
 Why to reduce the dimension?
 Performing feature selection with FSelector
 Performing dimension reduction with PCA
 Determining the number of principal components using the scree test
 Determining the number of principal components using the Kaiser method
 Visualizing multivariate data using biplot
 Performing dimension reduction with MDS
 Reducing dimensions with SVD
 Compressing images with SVD
 Performing nonlinear dimension reduction with ISOMAP
 Performing nonlinear dimension reduction with Local Linear Embedding

Big Data Analysis (R and Hadoop)
 Introduction
 Preparing the RHadoop environment
 Installing rmr2
 Installing rhdfs
 Operating HDFS with rhdfs
 Implementing a word count problem with RHadoop
 Comparing the performance between an R MapReduce program and a standard R program
 Testing and debugging the rmr2 program
 Installing plyrmr
 Manipulating data with plyrmr
 Conducting machine learning with RHadoop
 Configuring RHadoop clusters on Amazon EMR
Product information
 Title: Machine Learning with R Cookbook  Second Edition
 Author(s):
 Release date: October 2017
 Publisher(s): Packt Publishing
 ISBN: 9781787284395
You might also like
book
HandsOn Data Analysis with Pandas
Get to grips with pandas  a versatile and highperformance Python library for data manipulation, analysis, …
book
Statistics for Machine Learning
Build Machine Learning models with a sound statistical understanding. About This Book Learn about the statistics …
book
Regression Analysis with R
Build effective regression models in R to extract valuable insights from real data About This Book …
book
Practical Time Series Analysis
Time series data analysis is increasingly important due to the massive production of such data through …