Book description
Learn the ropes of supervised machine learning with R by studying popular realworld use cases, and understand how it drives object detection in driverless cars, customer churn, and loan default prediction.
Key Features
 Study supervised learning algorithms by using realworld datasets
 Fine tune optimal parameters with hyperparameter optimization
 Select the best algorithm using the model evaluation framework
Book Description
R provides excellent visualization features that are essential for exploring data before using it in automated learning.
Applied Supervised Learning with R helps you cover the complete process of employing R to develop applications using supervised machine learning algorithms for your business needs. The book starts by helping you develop your analytical thinking to create a problem statement using business inputs and domain research. You will then learn different evaluation metrics that compare various algorithms, and later progress to using these metrics to select the best algorithm for your problem. After finalizing the algorithm you want to use, you will study the hyperparameter optimization technique to finetune your set of optimal parameters. The book demonstrates how you can add different regularization terms to avoid overfitting your model.
By the end of this book, you will have the advanced skills you need for modeling a supervised machine learning algorithm that precisely fulfills your business needs.
What you will learn
 Develop analytical thinking to precisely identify a business problem
 Wrangle data with dplyr, tidyr, and reshape2
 Visualize data with ggplot2
 Validate your supervised machine learning model using kfold
 Optimize hyperparameters with grid and random search, and Bayesian optimization
 Deploy your model on Amazon Web Services (AWS) Lambda with plumber
 Improve your model's performance with feature selection and dimensionality reduction
Who this book is for
This book is specially designed for beginner and intermediatelevel data analysts, data scientists, and data engineers who want to explore different methods of supervised machine learning and its use cases. Some background in statistics, probability, calculus, linear algebra, and programming will help you thoroughly understand and follow the concepts covered in this book.
Table of contents
 Preface
 Chapter 1:

R for Advanced Analytics
 Introduction
 Working with RealWorld Datasets
 Reading Data from Various Data Formats
 Write R Markdown Files for Code Reproducibility
 Data Structures in R
 DataFrame
 Data Processing and Transformation
 The Apply Family of Functions

Useful Packages
 The dplyr Package
 Exercise 15: Implementing the dplyr Package
 The tidyr Package
 Exercise 16: Implementing the tidyr Package
 Activity 3: Create a DataFrame with Five Summary Statistics for All Numeric Variables from Bank Data Using dplyr and tidyr
 The plyr Package
 Exercise 17: Exploring the plyr Package
 The caret Package
 Data Visualization
 Line Charts
 Histogram
 Boxplot
 Summary
 Chapter 2:

Exploratory Analysis of Data
 Introduction
 Defining the Problem Statement
 Understanding the Science Behind EDA
 Exploratory Data Analysis

Univariate Analysis
 Exploring Numeric/Continuous Features
 Exercise 19: Visualizing Data Using a Box Plot
 Exercise 20: Visualizing Data Using a Histogram
 Exercise 21: Visualizing Data Using a Density Plot
 Exercise 22: Visualizing Multiple Variables Using a Histogram
 Activity 4: Plotting Multiple Density Plots and Boxplots
 Exercise 23: Plotting a Histogram for the nr.employed, euribor3m, cons.conf.idx, and duration Variables

Exploring Categorical Features
 Exercise 24: Exploring Categorical Features
 Exercise 25: Exploring Categorical Features Using a Bar Chart
 Exercise 26: Exploring Categorical Features using Pie Chart
 Exercise 27: Automate Plotting Categorical Variables
 Exercise 28: Automate Plotting for the Remaining Categorical Variables
 Exercise 29: Exploring the Last Remaining Categorical Variable and the Target Variable
 Bivariate Analysis
 Studying the Relationship between Two Numeric Variables
 Studying the Relationship between a Categorical and a Numeric Variable
 Studying the Relationship Between Two Categorical Variables
 Multivariate Analysis
 Validating Insights Using Statistical Tests
 Categorical Dependent and Numeric/Continuous Independent Variables
 Categorical Dependent and Categorical Independent Variables
 Summary
 Chapter 3:

Introduction to Supervised Learning
 Introduction
 Summary of the Beijing PM2.5 Dataset
 Regression and Classification Problems
 Machine Learning Workflow
 Regression

Exploratory Data Analysis (EDA)
 Exercise 42: Exploring the Time Series Views of PM2.5, DEWP, TEMP, and PRES variables of the Beijing PM2.5 Dataset
 Exercise 43: Undertaking Correlation Analysis
 Exercise 44: Drawing a Scatterplot to Explore the Relationship between PM2.5 Levels and Other Factors
 Activity 5: Draw a Scatterplot between PRES and PM2.5 Split by Months
 Model Building
 Exercise 45: Exploring Simple and Multiple Regression Models
 Model Interpretation
 Classification

Evaluation Metrics
 Mean Absolute Error (MAE)
 Root Mean Squared Error (RMSE)
 Rsquared
 Adjusted Rsquare
 Mean Reciprocal Rank (MRR)
 Exercise 47: Finding Evaluation Metrics
 Confusion MatrixBased Metrics
 Accuracy
 Sensitivity
 Specificity
 F1 Score
 Exercise 48: Working with Model Evaluation on Training Data
 Receiver Operating Characteristic (ROC) Curve
 Exercise 49: Creating an ROC Curve
 Summary
 Chapter 4:

Regression
 Introduction
 Linear Regression
 Model Diagnostics
 Residual versus Fitted Plot
 Normal QQ Plot
 ScaleLocation Plot
 Residual versus Leverage
 Improving the Model
 Quantile Regression
 Polynomial Regression
 Ridge Regression
 LASSO Regression
 Elastic Net Regression
 Poisson Regression
 Cox ProportionalHazards Regression Model
 NCCTG Lung Cancer Data
 Summary
 Chapter 5:

Classification
 Introduction

Getting Started with the Use Case
 Some Background on the Use Case
 Defining the Problem Statement
 Data Gathering
 Exercise 63: Exploring Data for the Use Case
 Exercise 64: Calculating the Null Value Percentage in All Columns
 Exercise 65: Removing Null Values from the Dataset
 Exercise 66: Engineer TimeBased Features from the Date Variable
 Exercise 67: Exploring the Location Frequency
 Exercise 68: Engineering the New Location with Reduced Levels
 Classification Techniques for Supervised Learning
 Logistic Regression
 How Does Logistic Regression Work?
 Evaluating Classification Models
 What Metric Should You Choose?
 Evaluating Logistic Regression

Decision Trees
 How Do Decision Trees Work?
 Exercise 72: Create a Decision Tree Model in R
 Activity 9: Create a Decision Tree Model with Additional Control Parameters
 Ensemble Modelling
 Random Forest
 Why Are Ensemble Models Used?
 Bagging – Predecessor to Random Forest
 How Does Random Forest Work?
 Exercise 73: Building a Random Forest Model in R
 Activity 10: Build a Random Forest Model with a Greater Number of Trees
 XGBoost
 Deep Neural Networks
 Choosing the Right Model for Your Use Case
 Summary
 Chapter 6:
 Feature Selection and Dimensionality Reduction
 Chapter 7:
 Model Improvements
 Chapter 8:
 Model Deployment
 Chapter 9:

Capstone Project  Based on Research Papers
 Introduction
 Exploring Research Work
 The mlr Package
 Problem Design from the Research Paper
 Features in Scene Dataset
 Implementing Multilabel Classifier Using the mlr and OpenML Packages

Constructing a Learner
 Adaptation Methods
 Transformation Methods
 Binary Relevance Method
 Classifier Chains Method
 Nested Stacking
 Dependent Binary Relevance Method
 Stacking
 Exercise 103: Generating Decision Tree Model Using the classif.rpart Method
 Train the Model
 Exercise 104: Train the Model
 Predicting the Output
 Performance of the Model
 Resampling the Data
 Binary Performance for Each Label
 Benchmarking Model
 Conducting Benchmark Experiments
 Exercise 105: Exploring How to Conduct a Benchmarking on Various Learners
 Accessing Benchmark Results
 Learner Performances
 Predictions
 Summary

Appendix
 Chapter 1: R for Advanced Analytics
 Chapter 2: Exploratory Analysis of Data
 Chapter 3: Introduction to Supervised Learning
 Chapter 4: Regression
 Chapter 5: Classification
 Chapter 6: Feature Selection and Dimensionality Reduction
 Chapter 7: Model Improvements
 Chapter 8: Model Deployment
 Chapter 9: Capstone Project  Based on Research Papers
Product information
 Title: Applied Supervised Learning with R
 Author(s):
 Release date: May 2019
 Publisher(s): Packt Publishing
 ISBN: 9781838556334
You might also like
book
Applied Unsupervised Learning with R
Design clever algorithms that discover hidden patterns and draw responses from unstructured, unlabeled data. Key Features …
book
HandsOn Deep Learning with R
Explore and implement deep learning to solve various realworld problems using modern R libraries such as …
book
Practical R 4: Applying R to Data Manipulation, Processing and Integration
Get started with an accelerated introduction to the R ecosystem, programming language, and tools including R …
book
Advanced Machine Learning with R
Master an array of machine learning techniques with realworld projects that interface TensorFlow with R, H2O, …