Mastering Machine Learning with R - Second Edition

Book description

Master machine learning techniques with R to deliver insights in complex projects

About This Book

  • Understand and apply machine learning methods using an extensive set of R packages such as XGBOOST

  • Understand the benefits and potential pitfalls of using machine learning methods such as Multi-Class Classification and Unsupervised Learning

  • Implement advanced concepts in machine learning with this example-rich guide

  • Who This Book Is For

    This book is for data science professionals, data analysts, or anyone with a working knowledge of machine learning, with R who now want to take their skills to the next level and become an expert in the field.

    What You Will Learn

  • Gain deep insights into the application of machine learning tools in the industry

  • Manipulate data in R efficiently to prepare it for analysis

  • Master the skill of recognizing techniques for effective visualization of data

  • Understand why and how to create test and training data sets for analysis

  • Master fundamental learning methods such as linear and logistic regression

  • Comprehend advanced learning methods such as support vector machines

  • Learn how to use R in a cloud service such as Amazon

  • In Detail

    This book will teach you advanced techniques in machine learning with the latest code in R 3.3.2. You will delve into statistical learning theory and supervised learning; design efficient algorithms; learn about creating Recommendation Engines; use multi-class classification and deep learning; and more.

    You will explore, in depth, topics such as data mining, classification, clustering, regression, predictive modeling, anomaly detection, boosted trees with XGBOOST, and more. More than just knowing the outcome, you’ll understand how these concepts work and what they do.

    With a slow learning curve on topics such as neural networks, you will explore deep learning, and more. By the end of this book, you will be able to perform machine learning with R in the cloud using AWS in various scenarios with different datasets.

    Style and approach

    The book delivers practical and real-world solutions to problems and a variety of tasks such as complex recommendation systems. By the end of this book, you will have gained expertise in performing R machine learning and will be able to build complex machine learning projects using R and its packages.

    Table of contents

    1. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Downloading the color images of this book
        3. Errata
        4. Piracy
        5. Questions
    2. A Process for Success
      1. The process
      2. Business understanding
        1. Identifying the business objective
        2. Assessing the situation
        3. Determining the analytical goals
        4. Producing a project plan
      3. Data understanding
      4. Data preparation
      5. Modeling
      6. Evaluation
      7. Deployment
      8. Algorithm flowchart
      9. Summary
    3. Linear Regression - The Blocking and Tackling of Machine Learning
      1. Univariate linear regression
        1. Business understanding
      2. Multivariate linear regression
        1. Business understanding
        2. Data understanding and preparation
        3. Modeling and evaluation
      3. Other linear model considerations
        1. Qualitative features
        2. Interaction terms
      4. Summary
    4. Logistic Regression and Discriminant Analysis
      1. Classification methods and linear regression
      2. Logistic regression
        1. Business understanding
        2. Data understanding and preparation
        3. Modeling and evaluation
          1. The logistic regression model
          2. Logistic regression with cross-validation
      3. Discriminant analysis overview
        1. Discriminant analysis application
      4. Multivariate Adaptive Regression Splines (MARS)
      5. Model selection
      6. Summary
    5. Advanced Feature Selection in Linear Models
      1. Regularization in a nutshell
        1. Ridge regression
        2. LASSO
        3. Elastic net
      2. Business case
        1. Business understanding
        2. Data understanding and preparation
      3. Modeling and evaluation
        1. Best subsets
        2. Ridge regression
        3. LASSO
        4. Elastic net
        5. Cross-validation with glmnet
      4. Model selection
      5. Regularization and classification
        1. Logistic regression example 
      6. Summary
    6. More Classification Techniques - K-Nearest Neighbors and Support Vector Machines
      1. K-nearest neighbors
      2. Support vector machines
      3. Business case
        1. Business understanding
        2. Data understanding and preparation
        3. Modeling and evaluation
          1. KNN modeling
          2. SVM modeling
        4. Model selection
      4. Feature selection for SVMs
      5. Summary
    7. Classification and Regression Trees
      1. An overview of the techniques
        1. Understanding the regression trees
        2. Classification trees
        3. Random forest
        4. Gradient boosting
      2. Business case
        1. Modeling and evaluation
          1. Regression tree
          2. Classification tree
          3. Random forest regression
          4. Random forest classification
          5. Extreme gradient boosting - classification
        2. Model selection
        3. Feature Selection with random forests
      3. Summary
    8. Neural Networks and Deep Learning
      1. Introduction to neural networks
      2. Deep learning, a not-so-deep overview
        1. Deep learning resources and advanced methods
      3. Business understanding
      4. Data understanding and preparation
      5. Modeling and evaluation
      6. An example of deep learning
        1. H2O background
        2. Data upload to H2O
        3. Create train and test datasets
        4. Modeling
      7. Summary
    9. Cluster Analysis
      1. Hierarchical clustering
        1. Distance calculations
      2. K-means clustering
      3. Gower and partitioning around medoids
        1. Gower
        2. PAM
      4. Random forest
      5. Business understanding
      6. Data understanding and preparation
      7. Modeling and evaluation
        1. Hierarchical clustering
        2. K-means clustering
        3. Gower and PAM
        4. Random Forest and PAM
      8. Summary
    10. Principal Components Analysis
      1. An overview of the principal components
        1. Rotation
      2. Business understanding
        1. Data understanding and preparation
      3. Modeling and evaluation
        1. Component extraction
        2. Orthogonal rotation and interpretation
        3. Creating factor scores from the components
        4. Regression analysis
      4. Summary
    11. Market Basket Analysis, Recommendation Engines, and Sequential Analysis
      1. An overview of a market basket analysis
      2. Business understanding
      3. Data understanding and preparation
      4. Modeling and evaluation
      5. An overview of a recommendation engine
        1. User-based collaborative filtering
        2. Item-based collaborative filtering
        3. Singular value decomposition and principal components analysis
      6. Business understanding and recommendations
      7. Data understanding, preparation, and recommendations
      8. Modeling, evaluation, and recommendations
      9. Sequential data analysis
        1. Sequential analysis applied
      10. Summary
    12. Creating Ensembles and Multiclass Classification
      1. Ensembles
      2. Business and data understanding
      3. Modeling evaluation and selection
      4. Multiclass classification
      5. Business and data understanding
      6. Model evaluation and selection
        1. Random forest
        2. Ridge regression
      7. MLR's ensemble
      8. Summary
    13. Time Series and Causality
      1. Univariate time series analysis
        1. Understanding Granger causality
      2. Business understanding
        1. Data understanding and preparation
      3. Modeling and evaluation
        1. Univariate time series forecasting
        2. Examining the causality
          1. Linear regression
          2. Vector autoregression
      4. Summary
    14. Text Mining
      1. Text mining framework and methods
      2. Topic models
        1. Other quantitative analyses
      3. Business understanding
        1. Data understanding and preparation
      4. Modeling and evaluation
        1. Word frequency and topic models
        2. Additional quantitative analysis
      5. Summary
    15. R on the Cloud
      1. Creating an Amazon Web Services account
        1. Launch a virtual machine
        2. Start RStudio
      2. Summary
    16. R Fundamentals
      1. Getting R up-and-running
      2. Using R
      3. Data frames and matrices
      4. Creating summary statistics
      5. Installing and loading R packages
      6. Data manipulation with dplyr
      7. Summary
    17. Sources

    Product information

    • Title: Mastering Machine Learning with R - Second Edition
    • Author(s): Cory Lesmeister
    • Release date: April 2017
    • Publisher(s): Packt Publishing
    • ISBN: 9781787287471