Hands-On Ensemble Learning with R

Book description

Explore powerful R packages to create predictive models using ensemble methods

Key Features

  • Implement machine learning algorithms to build ensemble-efficient models
  • Explore powerful R packages to create predictive models using ensemble methods
  • Learn to build ensemble models on large datasets using a practical approach

Book Description

Ensemble techniques are used for combining two or more similar or dissimilar machine learning algorithms to create a stronger model. Such a model delivers superior prediction power and can give your datasets a boost in accuracy.

Hands-On Ensemble Learning with R begins with the important statistical resampling methods. You will then walk through the central trilogy of ensemble techniques ? bagging, random forest, and boosting ? then you'll learn how they can be used to provide greater accuracy on large datasets using popular R packages. You will learn how to combine model predictions using different machine learning algorithms to build ensemble models. In addition to this, you will explore how to improve the performance of your ensemble models.

By the end of this book, you will have learned how machine learning algorithms can be combined to reduce common problems and build simple efficient ensemble models with the help of real-world examples.

What you will learn

  • Carry out an essential review of re-sampling methods, bootstrap, and jackknife
  • Explore the key ensemble methods: bagging, random forests, and boosting
  • Use multiple algorithms to make strong predictive models
  • Enjoy a comprehensive treatment of boosting methods
  • Supplement methods with statistical tests, such as ROC
  • Walk through data structures in classification, regression, survival, and time series data
  • Use the supplied R code to implement ensemble methods
  • Learn stacking method to combine heterogeneous machine learning models

Who this book is for

This book is for you if you are a data scientist or machine learning developer who wants to implement machine learning techniques by building ensemble models with the power of R. You will learn how to combine different machine learning algorithms to perform efficient data processing. Basic knowledge of machine learning techniques and programming knowledge of R would be an added advantage.

Table of contents

  1. Hands-On Ensemble Learning with R
    1. Table of Contents
    2. Hands-On Ensemble Learning with R
      1. Why subscribe?
      2. PacktPub.com
    3. Contributors
      1. About the author
      2. About the reviewer
      3. Packt is Searching for Authors Like You
    4. Preface
      1. Who this book is for
      2. What this book covers
      3. To get the most out of this book
        1. Download the example code files
        2. Download the color images
        3. Conventions used
      4. Get in touch
        1. Reviews
    5. 1. Introduction to Ensemble Techniques
      1. Datasets
        1. Hypothyroid
        2. Waveform
        3. German Credit
        4. Iris
        5. Pima Indians Diabetes
        6. US Crime
        7. Overseas visitors
        8. Primary Biliary Cirrhosis
        9. Multishapes
        10. Board Stiffness
      2. Statistical/machine learning models
        1. Logistic regression model
          1. Logistic regression for hypothyroid classification
        2. Neural networks
          1. Neural network for hypothyroid classification
        3. Naïve Bayes classifier
          1. Naïve Bayes for hypothyroid classification
        4. Decision tree
          1. Decision tree for hypothyroid classification
        5. Support vector machines
          1. SVM for hypothyroid classification
      3. The right model dilemma!
      4. An ensemble purview
      5. Complementary statistical tests
        1. Permutation test
        2. Chi-square and McNemar test
        3. ROC test
      6. Summary
    6. 2. Bootstrapping
      1. Technical requirements
      2. The jackknife technique
        1. The jackknife method for mean and variance
        2. Pseudovalues method for survival data
      3. Bootstrap – a statistical method
        1. The standard error of correlation coefficient
        2. The parametric bootstrap
        3. Eigen values
          1. Rule of thumb
      4. The boot package
      5. Bootstrap and testing hypotheses
      6. Bootstrapping regression models
      7. Bootstrapping survival models*
      8. Bootstrapping time series models*
      9. Summary
    7. 3. Bagging
      1. Technical requirements
      2. Classification trees and pruning
      3. Bagging
      4. k-NN classifier
        1. Analyzing waveform data
      5. k-NN bagging
      6. Summary
    8. 4. Random Forests
      1. Technical requirements
      2. Random Forests
      3. Variable importance
      4. Proximity plots
      5. Random Forest nuances
      6. Comparisons with bagging
      7. Missing data imputation
      8. Clustering with Random Forest
      9. Summary
    9. 5. The Bare Bones Boosting Algorithms
      1. Technical requirements
      2. The general boosting algorithm
        1. Adaptive boosting
        2. Gradient boosting
          1. Building it from scratch
          2. Squared-error loss function
        3. Using the adabag and gbm packages
        4. Variable importance
        5. Comparing bagging, random forests, and boosting
      3. Summary
    10. 6. Boosting Refinements
      1. Technical requirements
      2. Why does boosting work?
      3. The gbm package
        1. Boosting for count data
        2. Boosting for survival data
      4. The xgboost package
      5. The h2o package
      6. Summary
    11. 7. The General Ensemble Technique
      1. Technical requirements
      2. Why does ensembling work?
      3. Ensembling by voting
        1. Majority voting
        2. Weighted voting
      4. Ensembling by averaging
        1. Simple averaging
        2. Weight averaging
      5. Stack ensembling
      6. Summary
    12. 8. Ensemble Diagnostics
      1. Technical requirements
      2. What is ensemble diagnostics?
      3. Ensemble diversity
        1. Numeric prediction
          1. Class prediction
        2. Pairwise measure
        3. Disagreement measure
        4. Yule's or Q-statistic
        5. Correlation coefficient measure
        6. Cohen's statistic
        7. Double-fault measure
      4. Interrating agreement
        1. Entropy measure
        2. Kohavi-Wolpert measure
        3. Disagreement measure for ensemble
        4. Measurement of interrater agreement
      5. Summary
    13. 9. Ensembling Regression Models
      1. Technical requirements
      2. Pre-processing the housing data
      3. Visualization and variable reduction
        1. Variable clustering
      4. Regression models
        1. Linear regression model
        2. Neural networks
        3. Regression tree
        4. Prediction for regression models
      5. Bagging and Random Forests
      6. Boosting regression models
      7. Stacking methods for regression models
      8. Summary
    14. 10. Ensembling Survival Models
      1. Core concepts of survival analysis
      2. Nonparametric inference
      3. Regression models – parametric and Cox proportional hazards models
      4. Survival tree
      5. Ensemble survival models
      6. Summary
    15. 11. Ensembling Time Series Models
      1. Technical requirements
      2. Time series datasets
          1. AirPassengers
          2. co2
          3. uspop
          4. gas
          5. Car Sales
          6. austres
          7. WWWusage
        1. Time series visualization
        2. Core concepts and metrics
        3. Essential time series models
          1. Naïve forecasting
          2. Seasonal, trend, and loess fitting
          3. Exponential smoothing state space model
          4. Auto-regressive Integrated Moving Average (ARIMA) models
          5. Auto-regressive neural networks
          6. Messing it all up
        4. Bagging and time series
        5. Ensemble time series models
      3. Summary
    16. 12. What's Next?
    17. A. Bibliography
      1. References
      2. R package references
    18. Index

Product information

  • Title: Hands-On Ensemble Learning with R
  • Author(s): Prabhanjan Narayanachar Tattar
  • Release date: July 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781788624145