O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Machine Learning, 2nd Edition

Book Description

A Proven, Hands-On Approach for Students without a Strong Statistical FoundationSince the best-selling first edition was published, there have been several prominent developments in the field of machine learning, including the increasing work on the statistical interpretations of machine learning algorithms. Unfortunately, computer science students

Table of Contents

  1. Preliminaries
  2. Prologue to 2nd Edition
  3. Prologue to 1st Edition
  4. Chapter 1: Introduction
    1. 1.1 If Data Had Mass, The Earth would be A Black Hole
    2. 1.2 Learning
      1. 1.2.1 Machine Learning
    3. 1.3 Types of Machine Learning
    4. 1.4 Supervised Learning
      1. 1.4.1 Regression
      2. 1.4.2 Classification
    5. 1.5 The Machine Learning Process
    6. 1.6 A Note on Programming
    7. 1.7 A Roadmap to the Book
    8. Further Reading
      1. Figure 1.1
      2. Figure 1.2
      3. Figure 1.3
      4. Figure 1.4
      5. Figure 1.5
  5. Chapter 2: Preliminaries
    1. 2.1 Some Terminology
      1. 2.1.1 Weight Space
      2. 2.1.2 The Curse of Dimensionality
    2. 2.2 Knowing What You Know: Testing Machine Learning Algorithms
      1. 2.2.1 Overfitting
      2. 2.2.2 Training, Testing, and Validation Sets
      3. 2.2.3 The Confusion Matrix
      4. 2.2.4 Accuracy Metrics
      5. 2.2.5 The Receiver Operator Characteristic (ROC) Curve
      6. 2.2.6 Unbalanced Datasets
      7. 2.2.7 Measurement Precision
    3. 2.3 Turning Data into Probabilities
      1. 2.3.1 Minimising Risk
      2. 2.3.2 The Naïve Bayes’ Classifier
    4. 2.4 Some Basic Statistics
      1. 2.4.1 Averages
      2. 2.4.2 Variance and Covariance
      3. 2.4.3 The Gaussian
    5. 2.5 The Bias-Variance Tradeoff
    6. Further Reading
    7. Practice Questions
      1. Figure 2.1
      2. Figure 2.2
      3. Figure 2.3
      4. Figure 2.4
      5. Figure 2.5
      6. Figure 2.6
      7. Figure 2.7
      8. Figure 2.8
      9. Figure 2.9
      10. Figure 2.10
      11. Figure 2.11
      12. Figure 2.12
      13. Figure 2.13
      14. Figure 2.14
      15. Figure 2.15
  6. Chapter 3: Neurons, Neural Networks, and Linear Discriminants
    1. 3.1 The Brain and The Neuron
      1. 3.1.1 Hebb’s Rule
      2. 3.1.2 McCulloch and Pitts Neurons
      3. 3.1.3 Limitations of the McCulloch and Pitts Neuronal Model
    2. 3.2 Neural Networks
    3. 3.3 The Perceptron
      1. 3.3.1 The Learning Rate η
      2. 3.3.2 The Bias Input
      3. 3.3.3 The Perceptron Learning Algorithm
      4. 3.3.4 An Example of Perceptron Learning: Logic Functions
      5. 3.3.5 Implementation
    4. 3.4 Linear Separability
      1. 3.4.1 The Perceptron Convergence Theorem
      2. 3.4.2 The Exclusive Or (XOR) Function
      3. 3.4.3 A Useful Insight
      4. 3.4.4 Another Example: The Pima Indian Dataset
      5. 3.4.5 Preprocessing: Data Preparation
    5. 3.5 Linear Regression
      1. 3.5.1 Linear Regression Examples
    6. Further Reading
    7. Practice Questions
      1. Figure 3.1
      2. Figure 3.2
      3. Figure 3.3
      4. Figure 3.4
      5. Figure 3.5
      6. Figure 3.6
      7. Figure 3.7
      8. Figure 3.8
      9. Figure 3.9
      10. Figure 3.10
      11. Figure 3.11
      12. Figure 3.12
      13. Figure 3.13
  7. Chapter 4: The Multi-layer Perceptron
    1. 4.1 Going Forwards
      1. 4.1.1 Biases
    2. 4.2 Going Backwards: Back-Propagation of Error
      1. 4.2.1 The Multi-Layer Perceptron Algorithm
      2. 4.2.2 Initialising the Weights
      3. 4.2.3 Different Output Activation Functions
      4. 4.2.4 Sequential and Batch Training
      5. 4.2.5 Local Minima
      6. 4.2.6 Picking up Momentum
      7. 4.2.7 Minibatches and Stochastic Gradient Descent
      8. 4.2.8 Other Improvements
    3. 4.3 The Multi-Layer Perceptron in Practice
      1. 4.3.1 Amount of Training Data
      2. 4.3.2 Number of Hidden Layers
      3. 4.3.3 When to Stop Learning
    4. 4.4 Examples of Using the MLP
      1. 4.4.1 A Regression Problem
      2. 4.4.2 Classification with the MLP
      3. 4.4.3 A Classification Example: The Iris Dataset
      4. 4.4.4 Time-Series Prediction
      5. 4.4.5 Data Compression: The Auto-Associative Network
    5. 4.5 A Recipe for Using the MLP
    6. 4.6 Deriving Back-Propagation
      1. 4.6.1 The Network Output and the Error
      2. 4.6.2 The Error of the Network
      3. 4.6.3 Requirements of an Activation Function
      4. 4.6.4 Back-Propagation of Error
      5. 4.6.5 The Output Activation Functions
      6. 4.6.6 An Alternative Error Function
    7. Further Reading
    8. Practice Questions
      1. Figure 4.1
      2. Figure 4.2
      3. Figure 4.3
      4. Figure 4.4
      5. Figure 4.5
      6. Figure 4.6
      7. Figure 4.7
      8. Figure 4.8
      9. Figure 4.9
      10. Figure 4.10
      11. Figure 4.11
      12. Figure 4.12
      13. Figure 4.13
      14. Figure 4.14
      15. Figure 4.15
      16. Figure 4.16
      17. Figure 4.17
      18. Figure 4.18
      19. Figure 4.19
  8. Chapter 5: Radial Basis Functions and Splines
    1. 5.1 Receptive Fields
    2. 5.2 The Radial Basis Function (RBF) Network
      1. 5.2.1 Training the RBF Network
    3. 5.3 Interpolation and Basis Functions
      1. 5.3.1 Bases and Basis Expansion
      2. 5.3.2 The Cubic Spline
      3. 5.3.3 Fitting the Spline to the Data
      4. 5.3.4 Smoothing Splines
      5. 5.3.5 Higher Dimensions
      6. 5.3.6 Beyond the Bounds
    4. Further Reading
    5. Practice Questions
      1. Figure 5.1
      2. Figure 5.2
      3. Figure 5.3
      4. Figure 5.4
      5. Figure 5.5
      6. Figure 5.6
      7. Figure 5.7
      8. Figure 5.8
      9. Figure 5.9
      10. Figure 5.10
      11. Figure 5.11
  9. Chapter 6: Dimensionality Reduction
    1. 6.1 Linear Discriminant Analysis (LDA)
    2. 6.2 Principal Components Analysis (PCA)
      1. 6.2.1 Relation with the Multi-layer Perceptron
      2. 6.2.2 Kernel PCA
    3. 6.3 Factor Analysis
    4. 6.4 Independent Components Analysis (ICA)
    5. 6.5 Locally Linear Embedding
    6. 6.6 ISOMAP
      1. 6.6.1 Multi-Dimensional Scaling (MDS)
    7. Further Reading
    8. Practice Questions
      1. Figure 6.1
      2. Figure 6.2
      3. Figure 6.3
      4. Figure 6.4
      5. Figure 6.5
      6. Figure 6.6
      7. Figure 6.7
      8. Figure 6.8
      9. Figure 6.9
      10. Figure 6.10
      11. Figure 6.11
      12. Figure 6.12
      13. Figure 6.13
      14. Figure 6.14
      15. Figure 6.15
  10. Chapter 7: Probabilistic Learning
    1. 7.1 Gaussian Mixture Models
      1. 7.1.1 The Expectation-Maximisation (EM) Algorithm
      2. 7.1.2 Information Criteria
    2. 7.2 Nearest Neighbour Methods
      1. 7.2.1 Nearest Neighbour Smoothing
      2. 7.2.2 Efficient Distance Computations: the KD-Tree
      3. 7.2.3 Distance Measures
    3. Further Reading
    4. Practice Questions
      1. Figure 7.1
      2. Figure 7.2
      3. Figure 7.3
      4. Figure 7.4
      5. Figure 7.5
      6. Figure 7.6
      7. Figure 7.7
      8. Figure 7.8
      9. Figure 7.9
  11. Chapter 8: Support Vector Machines
    1. 8.1 Optimal Separation
      1. 8.1.1 The Margin and Support Vectors
      2. 8.1.2 A Constrained Optimisation Problem
      3. 8.1.3 Slack Variables for Non-Linearly Separable Problems
    2. 8.2 Kernels
      1. 8.2.1 Choosing Kernels
      2. 8.2.2 Example: XOR
    3. 8.3 The Support Vector Machine Algorithm
      1. 8.3.1 Implementation
      2. 8.3.2 Examples
    4. 8.4 Extensions To The SVM
      1. 8.4.1 Multi-Class Classification
      2. 8.4.2 SVM Regression
      3. 8.4.3 Other Advances
    5. Further Reading
    6. Practice Questions
      1. Figure 8.1
      2. Figure 8.2
      3. Figure 8.3
      4. Figure 8.4
      5. Figure 8.5
      6. Figure 8.6
      7. Figure 8.7
      8. Figure 8.8
      9. Figure 8.9
      10. Figure 8.10
  12. Chapter 9: Optimisation and Search
    1. 9.1 Going Downhill
      1. 9.1.1 Taylor Expansion
    2. 9.2 Least-Squares Optimisation
      1. 9.2.1 The Levenberg–Marquardt Algorithm
    3. 9.3 Conjugate Gradients
      1. 9.3.1 Conjugate Gradients Example
      2. 9.3.2 Conjugate Gradients and the MLP
    4. 9.4 Search: Three Basic Approaches
      1. 9.4.1 Exhaustive Search
      2. 9.4.2 Greedy Search
      3. 9.4.3 Hill Climbing
    5. 9.5 Exploitation and Exploration
    6. 9.6 Simulated Annealing
      1. 9.6.1 Comparison
    7. Further Reading
    8. Practice Questions
      1. Figure 9.1
      2. Figure 9.2
      3. Figure 9.3
      4. Figure 9.4
      5. Figure 9.5
      6. Figure 9.6
  13. Chapter 10: Evolutionary Learning
    1. 10.1 The Genetic Algorithm (GA)
      1. 10.1.1 String Representation
      2. 10.1.2 Evaluating Fitness
      3. 10.1.3 Population
      4. 10.1.4 Generating Offspring: Parent Selection
    2. 10.2 Generating Offspring: Genetic Operators
      1. 10.2.1 Crossover
      2. 10.2.2 Mutation
      3. 10.2.3 Elitism, Tournaments, and Niching
    3. 10.3 Using Genetic Algorithms
      1. 10.3.1 Map Colouring
      2. 10.3.2 Punctuated Equilibrium
      3. 10.3.3 Example: The Knapsack Problem
      4. 10.3.4 Example: The Four Peaks Problem
      5. 10.3.5 Limitations of the GA
      6. 10.3.6 Training Neural Networks with Genetic Algorithms
    4. 10.4 Genetic Programming
    5. 10.5 Combining Sampling with Evolutionary Learning
    6. Further Reading
    7. Practice Questions
      1. Figure 10.1
      2. Figure 10.2
      3. Figure 10.3
      4. Figure 10.4
      5. Figure 10.5
      6. Figure 10.6
      7. Figure 10.7
      8. Figure 10.8
      9. Figure 10.9
      10. Figure 10.10
      11. Figure 10.11
      12. Figure 10.12
      13. Figure 10.13
      14. Figure 10.14
      15. Figure 10.15
      16. Figure 10.16
  14. Chapter 11: Reinforcement Learning
    1. 11.1 Overview
    2. 11.2 Example: Getting Lost
      1. 11.2.1 State and Action Spaces
      2. 11.2.2 Carrots and Sticks: The Reward Function
      3. 11.2.3 Discounting
      4. 11.2.4 Action Selection
      5. 11.2.5 Policy
    3. 11.3 Markov Decision Processes
      1. 11.3.1 The Markov Property
      2. 11.3.2 Probabilities in Markov Decision Processes
    4. 11.4 Values
    5. 11.5 Back on Holiday: Using Reinforcement Learning
    6. 11.6 The Difference Between Sarsa and Q-Learning
    7. 11.7 Uses of Reinforcement Learning
    8. Further Reading
    9. Practice Questions
      1. Figure 11.1
      2. Figure 11.2
      3. Figure 11.3
      4. Figure 11.4
      5. Figure 11.5
      6. Figure 11.6
      7. Figure 11.7
      8. Figure 11.8
      9. Figure 11.9
  15. Chapter 12: Learning with Trees
    1. 12.1 Using Decision Trees
    2. 12.2 Constructing Decision Trees
      1. 12.2.1 Quick Aside: Entropy in Information Theory
      2. 12.2.2 ID3
      3. 12.2.3 Implementing Trees and Graphs in Python
      4. 12.2.4 Implementation of the Decision Tree
      5. 12.2.5 Dealing with Continuous Variables
      6. 12.2.6 Computational Complexity
    3. 12.3 Classification and Regression Trees (CART)
      1. 12.3.1 Gini Impurity
      2. 12.3.2 Regression in Trees
    4. 12.4 Classification Example
    5. Further Reading
    6. Practice Questions
      1. Figure 12.1
      2. Figure 12.2
      3. Figure 12.3
      4. Figure 12.4
      5. Figure 12.5
      6. Figure 12.6
      7. Figure 12.7
  16. Chapter 13: Decision by Committee: Ensemble Learning
    1. 13.1 Boosting
      1. 13.1.1 AdaBoost
      2. 13.1.2 Stumping
    2. 13.2 Bagging
      1. 13.2.1 Subagging
    3. 13.3 Random Forests
      1. 13.3.1 Comparison with Boosting
    4. 13.4 Different Ways to Combine Classifiers
    5. Further Reading
    6. Practice Questions
      1. Figure 13.1
      2. Figure 13.2
      3. Figure 13.3
      4. Figure 13.4
      5. Figure 13.5
  17. Chapter 14: Unsupervised Learning
    1. 14.1 The K-Means Algorithm
      1. 14.1.1 Dealing with Noise
      2. 14.1.2 The k-Means Neural Network
      3. 14.1.3 Normalisation
      4. 14.1.4 A Better Weight Update Rule
      5. 14.1.5 Example: The Iris Dataset Again
      6. 14.1.6 Using Competitive Learning for Clustering
    2. 14.2 Vector Quantisation
    3. 14.3 The Self-Organising Feature Map
      1. 14.3.1 The SOM Algorithm
      2. 14.3.2 Neighbourhood Connections
      3. 14.3.3 Self-Organisation
      4. 14.3.4 Network Dimensionality and Boundary Conditions
      5. 14.3.5 Examples of Using the SOM
    4. Further Reading
    5. Practice Questions
      1. Figure 14.1
      2. Figure 14.2
      3. Figure 14.3
      4. Figure 14.4
      5. Figure 14.5
      6. Figure 14.6
      7. Figure 14.7
      8. Figure 14.8
      9. Figure 14.9
      10. Figure 14.10
      11. Figure 14.11
      12. Figure 14.12
      13. Figure 14.13
      14. Figure 14.14
  18. Chapter 15: Markov Chain Monte Carlo (MCMC) Methods
    1. 15.1 Sampling
      1. 15.1.1 Random Numbers
      2. 15.1.2 Gaussian Random Numbers
    2. 15.2 Monte Carlo or Bust
    3. 15.3 The Proposal Distribution
    4. 15.4 Markov Chain Monte Carlo
      1. 15.4.1 Markov Chains
      2. 15.4.2 The Metropolis–Hastings Algorithm
      3. 15.4.3 Simulated Annealing (Again)
      4. 15.4.4 Gibbs Sampling
    5. Further Reading
    6. Practice Questions
      1. Figure 15.1
      2. Figure 15.2
      3. Figure 15.3
      4. Figure 15.4
      5. Figure 15.5
      6. Figure 15.6
      7. Figure 15.7
      8. Figure 15.8
  19. Chapter 16: Graphical Models
    1. 16.1 Bayesian Networks
      1. 16.1.1 Example: Exam Fear
      2. 16.1.2 Approximate Inference
      3. 16.1.3 Making Bayesian Networks
    2. 16.2 Markov Random Fields
    3. 16.3 Hidden Markov Models (HMMS)
      1. 16.3.1 The Forward Algorithm
      2. 16.3.2 The Viterbi Algorithm
      3. 16.3.3 The Baum–Welch or Forward–Backward Algorithm
    4. 16.4 Tracking Methods
      1. 16.4.1 The Kalman Filter
      2. 16.4.2 The Particle Filter
    5. Further Reading
    6. Practice Questions
      1. Figure 16.1
      2. Figure 16.2
      3. Figure 16.3
      4. Figure 16.4
      5. Figure 16.5
      6. Figure 16.6
      7. Figure 16.7
      8. Figure 16.8
      9. Figure 16.9
      10. Figure 16.10
      11. Figure 16.11
      12. Figure 16.12
      13. Figure 16.13
      14. Figure 16.14
      15. Figure 16.15
      16. Figure 16.16
      17. Figure 16.17
      18. Figure 16.18
      19. Figure 16.19
      20. Figure 16.20
  20. Chapter 17: Symmetric Weights and Deep Belief Networks
    1. 17.1 Energetic Learning: The Hopfield Network
      1. 17.1.1 Associative Memory
      2. 17.1.2 Making an Associative Memory
      3. 17.1.3 An Energy Function
      4. 17.1.4 Capacity of the Hopfield Network
      5. 17.1.5 The Continuous Hopfield Network
    2. 17.2 Stochastic Neurons — The Boltzmann Machine
      1. 17.2.1 The Restricted Boltzmann Machine
      2. 17.2.2 Deriving the CD Algorithm
      3. 17.2.3 Supervised Learning
      4. 17.2.4 The RBM as a Directed Belief Network
    3. 17.3 Deep Learning
      1. 17.3.1 Deep Belief Networks (DBN)
    4. Further Reading
    5. Practice Questions
      1. Figure 17.1
      2. Figure 17.2
      3. Figure 17.3
      4. Figure 17.4
      5. Figure 17.5
      6. Figure 17.6
      7. Figure 17.7
      8. Figure 17.8
      9. Figure 17.9
      10. Figure 17.10
      11. Figure 17.11
      12. Figure 17.12
      13. Figure 17.13
      14. Figure 17.14
  21. Chapter 18: Gaussian Processes
    1. 18.1 Gaussian Process Regression
      1. 18.1.1 Adding Noise
      2. 18.1.2 Implementation
      3. 18.1.3 Learning the Parameters
      4. 18.1.4 Implementation
      5. 18.1.5 Choosing a (set of) Covariance Functions
    2. 18.2 Gaussian Process Classification
      1. 18.2.1 The Laplace Approximation
      2. 18.2.2 Computing the Posterior
      3. 18.2.3 Implementation
    3. Further Reading
    4. Practice Questions
      1. Figure 18.1
      2. Figure 18.2
      3. Figure 18.3
      4. Figure 18.4
      5. Figure 18.5
      6. Figure 18.6
      7. Figure 18.7
      8. Figure 18.8
      9. Figure 18.9
  22. Appendix A: Python
    1. A.1 Installing Python and Other Packages
    2. A.2 Getting Started
      1. A.2.1 Python for MATLAB® and R users
    3. A.3 Code Basics
      1. A.3.1 Writing and Importing Code
      2. A.3.2 Control Flow
      3. A.3.3 Functions
      4. A.3.4 The doc String
      5. A.3.5 map and lambda
      6. A.3.6 Exceptions
      7. A.3.7 Classes
    4. A.4 Using Numpy And Matplotlib
      1. A.4.1 Arrays
      2. A.4.2 Random Numbers
      3. A.4.3 Linear Algebra
      4. A.4.4 Plotting
      5. A.4.5 One Thing to Be Aware of
    5. Further Reading
    6. Practice Questions
      1. Figure A.1