Statistics for Machine Learning

Book description

Build Machine Learning models with a sound statistical understanding.

About This Book

  • Learn about the statistics behind powerful predictive models with p-value, ANOVA, and F- statistics.
  • Implement statistical computations programmatically for supervised and unsupervised learning through K-means clustering.
  • Master the statistical aspect of Machine Learning with the help of this example-rich guide to R and Python.

Who This Book Is For

This book is intended for developers with little to no background in statistics, who want to implement Machine Learning in their systems. Some programming knowledge in R or Python will be useful.

What You Will Learn

  • Understand the Statistical and Machine Learning fundamentals necessary to build models
  • Understand the major differences and parallels between the statistical way and the Machine Learning way to solve problems
  • Learn how to prepare data and feed models by using the appropriate Machine Learning algorithms from the more-than-adequate R and Python packages
  • Analyze the results and tune the model appropriately to your own predictive goals
  • Understand the concepts of required statistics for Machine Learning
  • Introduce yourself to necessary fundamentals required for building supervised & unsupervised deep learning models
  • Learn reinforcement learning and its application in the field of artificial intelligence domain

In Detail

Complex statistics in Machine Learning worry a lot of developers. Knowing statistics helps you build strong Machine Learning models that are optimized for a given problem statement. This book will teach you all it takes to perform complex statistical computations required for Machine Learning. You will gain information on statistics behind supervised learning, unsupervised learning, reinforcement learning, and more. Understand the real-world examples that discuss the statistical side of Machine Learning and familiarize yourself with it. You will also design programs for performing tasks such as model, parameter fitting, regression, classification, density collection, and more.

By the end of the book, you will have mastered the required statistics for Machine Learning and will be able to apply your new skills to any sort of industry problem.

Style and approach

This practical, step-by-step guide will give you an understanding of the Statistical and Machine Learning fundamentals you'll need to build models.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at If you purchased this book elsewhere, you can visit and register to have the code file.

Publisher resources

Download Example Code

Table of contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Downloading the color images of this book
      3. Errata
      4. Piracy
      5. Questions
  2. Journey from Statistics to Machine Learning
    1. Statistical terminology for model building and validation
      1. Machine learning
      2. Major differences between statistical modeling and machine learning
      3. Steps in machine learning model development and deployment
      4. Statistical fundamentals and terminology for model building and validation
      5. Bias versus variance trade-off
      6. Train and test data
    2. Machine learning terminology for model building and validation
      1. Linear regression versus gradient descent
      2. Machine learning losses
      3. When to stop tuning machine learning models
      4. Train, validation, and test data
      5. Cross-validation
      6. Grid search
    3. Machine learning model overview
    4. Summary
  3. Parallelism of Statistics and Machine Learning
    1. Comparison between regression and machine learning models
    2. Compensating factors in machine learning models
      1. Assumptions of linear regression
      2. Steps applied in linear regression modeling
      3. Example of simple linear regression from first principles
      4. Example of simple linear regression using the wine quality data
      5. Example of multilinear regression - step-by-step methodology of model building
        1. Backward and forward selection
    3. Machine learning models - ridge and lasso regression
      1. Example of ridge regression machine learning
      2. Example of lasso regression machine learning model
      3. Regularization parameters in linear regression and ridge/lasso regression
    4. Summary
  4. Logistic Regression Versus Random Forest
    1. Maximum likelihood estimation
    2. Logistic regression – introduction and advantages
      1. Terminology involved in logistic regression
      2. Applying steps in logistic regression modeling
      3. Example of logistic regression using German credit data
    3. Random forest
      1. Example of random forest using German credit data
        1. Grid search on random forest
    4. Variable importance plot
    5. Comparison of logistic regression with random forest
    6. Summary
  5. Tree-Based Machine Learning Models
    1. Introducing decision tree classifiers
      1. Terminology used in decision trees
      2. Decision tree working methodology from first principles
    2. Comparison between logistic regression and decision trees
    3. Comparison of error components across various styles of models
    4. Remedial actions to push the model towards the ideal region
    5. HR attrition data example
    6. Decision tree classifier
    7. Tuning class weights in decision tree classifier
    8. Bagging classifier
    9. Random forest classifier
    10. Random forest classifier - grid search
    11. AdaBoost classifier
    12. Gradient boosting classifier
    13. Comparison between AdaBoosting versus gradient boosting
    14. Extreme gradient boosting - XGBoost classifier
    15. Ensemble of ensembles - model stacking
    16. Ensemble of ensembles with different types of classifiers
    17. Ensemble of ensembles with bootstrap samples using a single type of classifier
    18. Summary
  6. K-Nearest Neighbors and Naive Bayes
    1. K-nearest neighbors
      1. KNN voter example
      2. Curse of dimensionality
        1. Curse of dimensionality with 1D, 2D, and 3D example
    2. KNN classifier with breast cancer Wisconsin data example
    3. Tuning of k-value in KNN classifier
    4. Naive Bayes
    5. Probability fundamentals
      1. Joint probability
    6. Understanding Bayes theorem with conditional probability
    7. Naive Bayes classification
    8. Laplace estimator
    9. Naive Bayes SMS spam classification example
    10. Summary
  7. Support Vector Machines and Neural Networks
    1. Support vector machines working principles
      1. Maximum margin classifier
      2. Support vector classifier
      3. Support vector machines
    2. Kernel functions
    3. SVM multilabel classifier with letter recognition data example
      1. Maximum margin classifier - linear kernel
      2. Polynomial kernel
      3. RBF kernel
    4. Artificial neural networks - ANN
    5. Activation functions
    6. Forward propagation and backpropagation
    7. Optimization of neural networks
      1. Stochastic gradient descent - SGD
      2. Momentum
      3. Nesterov accelerated gradient - NAG
      4. Adagrad
      5. Adadelta
      6. RMSprop
      7. Adaptive moment estimation - Adam
      8. Limited-memory broyden-fletcher-goldfarb-shanno - L-BFGS optimization algorithm
    8. Dropout in neural networks
    9. ANN classifier applied on handwritten digits using scikit-learn
    10. Introduction to deep learning
      1. Solving methodology
      2. Deep learning software
      3. Deep neural network classifier applied on handwritten digits using Keras
    11. Summary
  8. Recommendation Engines
    1. Content-based filtering
      1. Cosine similarity
    2. Collaborative filtering
      1. Advantages of collaborative filtering over content-based filtering
      2. Matrix factorization using the alternating least squares algorithm for collaborative filtering
    3. Evaluation of recommendation engine model
      1. Hyperparameter selection in recommendation engines using grid search
      2. Recommendation engine application on movie lens data
        1. User-user similarity matrix
        2. Movie-movie similarity matrix
        3. Collaborative filtering using ALS
        4. Grid search on collaborative filtering
      3. Summary
  9. Unsupervised Learning
    1. K-means clustering
      1. K-means working methodology from first principles
      2. Optimal number of clusters and cluster evaluation
        1. The elbow method
      3. K-means clustering with the iris data example
    2. Principal component analysis - PCA
      1. PCA working methodology from first principles
      2. PCA applied on handwritten digits using scikit-learn
    3. Singular value decomposition - SVD
      1. SVD applied on handwritten digits using scikit-learn
    4. Deep auto encoders
    5. Model building technique using encoder-decoder architecture
    6. Deep auto encoders applied on handwritten digits using Keras
    7. Summary
  10. Reinforcement Learning
    1. Introduction to reinforcement learning
    2. Comparing supervised, unsupervised, and reinforcement learning in detail
    3. Characteristics of reinforcement learning
    4. Reinforcement learning basics
      1. Category 1 - value based 
      2. Category 2 - policy based 
      3. Category 3 - actor-critic
      4. Category 4 - model-free
      5. Category 5 - model-based
      6. Fundamental categories in sequential decision making
    5. Markov decision processes and Bellman equations
    6. Dynamic programming
      1. Algorithms to compute optimal policy using dynamic programming
    7. Grid world example using value and policy iteration algorithms with basic Python
    8. Monte Carlo methods
      1. Comparison between dynamic programming and Monte Carlo methods
      2. Key advantages of MC over DP methods
      3. Monte Carlo prediction
      4. The suitability of Monte Carlo prediction on grid-world problems
      5. Modeling Blackjack example of Monte Carlo methods using Python
    9. Temporal difference learning
      1. Comparison between Monte Carlo methods and temporal difference learning
      2. TD prediction
      3. Driving office example for TD learning
    10. SARSA on-policy TD control
    11. Q-learning - off-policy TD control
    12. Cliff walking example of on-policy and off-policy of TD control
    13. Applications of reinforcement learning with integration of machine learning and deep learning
      1. Automotive vehicle control - self-driving cars
      2. Google DeepMind's AlphaGo
      3. Robo soccer
    14. Further reading
    15. Summary

Product information

  • Title: Statistics for Machine Learning
  • Author(s): Pratap Dangeti
  • Release date: July 2017
  • Publisher(s): Packt Publishing
  • ISBN: 9781788295758