book

Machine Learning, 2nd Edition

Name: Machine Learning, 2nd Edition
Author: Stephen Marsland
ISBN: 9781498759786

by Stephen Marsland

October 2014

Intermediate to advanced

457 pages

14h 7m

English

Chapman and Hall/CRC

Read now

Unlock full access

Preliminaries
Prologue to 2nd Edition
Prologue to 1st Edition
Chapter 1: Introduction
1.1 If Data Had Mass, The Earth would be A Black Hole1.2 Learning1.2.1 Machine Learning1.3 Types of Machine Learning1.4 Supervised Learning1.4.1 Regression1.4.2 Classification1.5 The Machine Learning Process1.6 A Note on Programming1.7 A Roadmap to the BookFurther ReadingFigure 1.1Figure 1.2Figure 1.3Figure 1.4Figure 1.5
Chapter 2: Preliminaries
2.1 Some Terminology2.1.1 Weight Space2.1.2 The Curse of Dimensionality2.2 Knowing What You Know: Testing Machine Learning Algorithms2.2.1 Overfitting2.2.2 Training, Testing, and Validation Sets2.2.3 The Confusion Matrix2.2.4 Accuracy Metrics2.2.5 The Receiver Operator Characteristic (ROC) Curve2.2.6 Unbalanced Datasets2.2.7 Measurement Precision2.3 Turning Data into Probabilities2.3.1 Minimising Risk2.3.2 The Naïve Bayes’ Classifier2.4 Some Basic Statistics2.4.1 Averages2.4.2 Variance and Covariance2.4.3 The Gaussian2.5 The Bias-Variance TradeoffFurther ReadingPractice QuestionsFigure 2.1Figure 2.2Figure 2.3Figure 2.4Figure 2.5Figure 2.6Figure 2.7Figure 2.8Figure 2.9Figure 2.10Figure 2.11Figure 2.12Figure 2.13Figure 2.14Figure 2.15
Chapter 3: Neurons, Neural Networks, and Linear Discriminants
3.1 The Brain and The Neuron3.1.1 Hebb’s Rule3.1.2 McCulloch and Pitts Neurons3.1.3 Limitations of the McCulloch and Pitts Neuronal Model3.2 Neural Networks3.3 The Perceptron3.3.1 The Learning Rate η3.3.2 The Bias Input3.3.3 The Perceptron Learning Algorithm3.3.4 An Example of Perceptron Learning: Logic Functions3.3.5 Implementation3.4 Linear Separability3.4.1 The Perceptron Convergence Theorem3.4.2 The Exclusive Or (XOR) Function3.4.3 A Useful Insight3.4.4 Another Example: The Pima Indian Dataset3.4.5 Preprocessing: Data Preparation3.5 Linear Regression3.5.1 Linear Regression ExamplesFurther ReadingPractice QuestionsFigure 3.1Figure 3.2Figure 3.3Figure 3.4Figure 3.5Figure 3.6Figure 3.7Figure 3.8Figure 3.9Figure 3.10Figure 3.11Figure 3.12Figure 3.13
Chapter 4: The Multi-layer Perceptron
4.1 Going Forwards4.1.1 Biases4.2 Going Backwards: Back-Propagation of Error4.2.1 The Multi-Layer Perceptron Algorithm4.2.2 Initialising the Weights4.2.3 Different Output Activation Functions4.2.4 Sequential and Batch Training4.2.5 Local Minima4.2.6 Picking up Momentum4.2.7 Minibatches and Stochastic Gradient Descent4.2.8 Other Improvements4.3 The Multi-Layer Perceptron in Practice4.3.1 Amount of Training Data4.3.2 Number of Hidden Layers4.3.3 When to Stop Learning4.4 Examples of Using the MLP4.4.1 A Regression Problem4.4.2 Classification with the MLP4.4.3 A Classification Example: The Iris Dataset4.4.4 Time-Series Prediction4.4.5 Data Compression: The Auto-Associative Network4.5 A Recipe for Using the MLP4.6 Deriving Back-Propagation4.6.1 The Network Output and the Error4.6.2 The Error of the Network4.6.3 Requirements of an Activation Function4.6.4 Back-Propagation of Error4.6.5 The Output Activation Functions4.6.6 An Alternative Error FunctionFurther ReadingPractice QuestionsFigure 4.1Figure 4.2Figure 4.3Figure 4.4Figure 4.5Figure 4.6Figure 4.7Figure 4.8Figure 4.9Figure 4.10Figure 4.11Figure 4.12Figure 4.13Figure 4.14Figure 4.15Figure 4.16Figure 4.17Figure 4.18Figure 4.19
Chapter 5: Radial Basis Functions and Splines
5.1 Receptive Fields5.2 The Radial Basis Function (RBF) Network5.2.1 Training the RBF Network5.3 Interpolation and Basis Functions5.3.1 Bases and Basis Expansion5.3.2 The Cubic Spline5.3.3 Fitting the Spline to the Data5.3.4 Smoothing Splines5.3.5 Higher Dimensions5.3.6 Beyond the BoundsFurther ReadingPractice QuestionsFigure 5.1Figure 5.2Figure 5.3Figure 5.4Figure 5.5Figure 5.6Figure 5.7Figure 5.8Figure 5.9Figure 5.10Figure 5.11
Chapter 6: Dimensionality Reduction
6.1 Linear Discriminant Analysis (LDA)6.2 Principal Components Analysis (PCA)6.2.1 Relation with the Multi-layer Perceptron6.2.2 Kernel PCA6.3 Factor Analysis6.4 Independent Components Analysis (ICA)6.5 Locally Linear Embedding6.6 ISOMAP6.6.1 Multi-Dimensional Scaling (MDS)Further ReadingPractice QuestionsFigure 6.1Figure 6.2Figure 6.3Figure 6.4Figure 6.5Figure 6.6Figure 6.7Figure 6.8Figure 6.9Figure 6.10Figure 6.11Figure 6.12Figure 6.13Figure 6.14Figure 6.15
Chapter 7: Probabilistic Learning
7.1 Gaussian Mixture Models7.1.1 The Expectation-Maximisation (EM) Algorithm7.1.2 Information Criteria7.2 Nearest Neighbour Methods7.2.1 Nearest Neighbour Smoothing7.2.2 Efficient Distance Computations: the KD-Tree7.2.3 Distance MeasuresFurther ReadingPractice QuestionsFigure 7.1Figure 7.2Figure 7.3Figure 7.4Figure 7.5Figure 7.6Figure 7.7Figure 7.8Figure 7.9

Chapter 8: Support Vector Machines
8.1 Optimal Separation8.1.1 The Margin and Support Vectors8.1.2 A Constrained Optimisation Problem8.1.3 Slack Variables for Non-Linearly Separable Problems8.2 Kernels8.2.1 Choosing Kernels8.2.2 Example: XOR8.3 The Support Vector Machine Algorithm8.3.1 Implementation8.3.2 Examples8.4 Extensions To The SVM8.4.1 Multi-Class Classification8.4.2 SVM Regression8.4.3 Other AdvancesFurther ReadingPractice QuestionsFigure 8.1Figure 8.2Figure 8.3Figure 8.4Figure 8.5Figure 8.6Figure 8.7Figure 8.8Figure 8.9Figure 8.10
Chapter 9: Optimisation and Search
9.1 Going Downhill9.1.1 Taylor Expansion9.2 Least-Squares Optimisation9.2.1 The Levenberg–Marquardt Algorithm9.3 Conjugate Gradients9.3.1 Conjugate Gradients Example9.3.2 Conjugate Gradients and the MLP9.4 Search: Three Basic Approaches9.4.1 Exhaustive Search9.4.2 Greedy Search9.4.3 Hill Climbing9.5 Exploitation and Exploration9.6 Simulated Annealing9.6.1 ComparisonFurther ReadingPractice QuestionsFigure 9.1Figure 9.2Figure 9.3Figure 9.4Figure 9.5Figure 9.6
Chapter 10: Evolutionary Learning
10.1 The Genetic Algorithm (GA)10.1.1 String Representation10.1.2 Evaluating Fitness10.1.3 Population10.1.4 Generating Offspring: Parent Selection10.2 Generating Offspring: Genetic Operators10.2.1 Crossover10.2.2 Mutation10.2.3 Elitism, Tournaments, and Niching10.3 Using Genetic Algorithms10.3.1 Map Colouring10.3.2 Punctuated Equilibrium10.3.3 Example: The Knapsack Problem10.3.4 Example: The Four Peaks Problem10.3.5 Limitations of the GA10.3.6 Training Neural Networks with Genetic Algorithms10.4 Genetic Programming10.5 Combining Sampling with Evolutionary LearningFurther ReadingPractice QuestionsFigure 10.1Figure 10.2Figure 10.3Figure 10.4Figure 10.5Figure 10.6Figure 10.7Figure 10.8Figure 10.9Figure 10.10Figure 10.11Figure 10.12Figure 10.13Figure 10.14Figure 10.15Figure 10.16
Chapter 11: Reinforcement Learning
11.1 Overview11.2 Example: Getting Lost11.2.1 State and Action Spaces11.2.2 Carrots and Sticks: The Reward Function11.2.3 Discounting11.2.4 Action Selection11.2.5 Policy11.3 Markov Decision Processes11.3.1 The Markov Property11.3.2 Probabilities in Markov Decision Processes11.4 Values11.5 Back on Holiday: Using Reinforcement Learning11.6 The Difference Between Sarsa and Q-Learning11.7 Uses of Reinforcement LearningFurther ReadingPractice QuestionsFigure 11.1Figure 11.2Figure 11.3Figure 11.4Figure 11.5Figure 11.6Figure 11.7Figure 11.8Figure 11.9
Chapter 12: Learning with Trees
12.1 Using Decision Trees12.2 Constructing Decision Trees12.2.1 Quick Aside: Entropy in Information Theory12.2.2 ID312.2.3 Implementing Trees and Graphs in Python12.2.4 Implementation of the Decision Tree12.2.5 Dealing with Continuous Variables12.2.6 Computational Complexity12.3 Classification and Regression Trees (CART)12.3.1 Gini Impurity12.3.2 Regression in Trees12.4 Classification ExampleFurther ReadingPractice QuestionsFigure 12.1Figure 12.2Figure 12.3Figure 12.4Figure 12.5Figure 12.6Figure 12.7
Chapter 13: Decision by Committee: Ensemble Learning
13.1 Boosting13.1.1 AdaBoost13.1.2 Stumping13.2 Bagging13.2.1 Subagging13.3 Random Forests13.3.1 Comparison with Boosting13.4 Different Ways to Combine ClassifiersFurther ReadingPractice QuestionsFigure 13.1Figure 13.2Figure 13.3Figure 13.4Figure 13.5
Chapter 14: Unsupervised Learning
14.1 The K-Means Algorithm14.1.1 Dealing with Noise14.1.2 The k-Means Neural Network14.1.3 Normalisation14.1.4 A Better Weight Update Rule14.1.5 Example: The Iris Dataset Again14.1.6 Using Competitive Learning for Clustering14.2 Vector Quantisation14.3 The Self-Organising Feature Map14.3.1 The SOM Algorithm14.3.2 Neighbourhood Connections14.3.3 Self-Organisation14.3.4 Network Dimensionality and Boundary Conditions14.3.5 Examples of Using the SOMFurther ReadingPractice QuestionsFigure 14.1Figure 14.2Figure 14.3Figure 14.4Figure 14.5Figure 14.6Figure 14.7Figure 14.8Figure 14.9Figure 14.10Figure 14.11Figure 14.12Figure 14.13Figure 14.14
Chapter 15: Markov Chain Monte Carlo (MCMC) Methods
15.1 Sampling15.1.1 Random Numbers15.1.2 Gaussian Random Numbers15.2 Monte Carlo or Bust15.3 The Proposal Distribution15.4 Markov Chain Monte Carlo15.4.1 Markov Chains15.4.2 The Metropolis–Hastings Algorithm15.4.3 Simulated Annealing (Again)15.4.4 Gibbs SamplingFurther ReadingPractice QuestionsFigure 15.1Figure 15.2Figure 15.3Figure 15.4Figure 15.5Figure 15.6Figure 15.7Figure 15.8
Chapter 16: Graphical Models
16.1 Bayesian Networks16.1.1 Example: Exam Fear16.1.2 Approximate Inference16.1.3 Making Bayesian Networks16.2 Markov Random Fields16.3 Hidden Markov Models (HMMS)16.3.1 The Forward Algorithm16.3.2 The Viterbi Algorithm16.3.3 The Baum–Welch or Forward–Backward Algorithm16.4 Tracking Methods16.4.1 The Kalman Filter16.4.2 The Particle FilterFurther ReadingPractice QuestionsFigure 16.1Figure 16.2Figure 16.3Figure 16.4Figure 16.5Figure 16.6Figure 16.7Figure 16.8Figure 16.9Figure 16.10Figure 16.11Figure 16.12Figure 16.13Figure 16.14Figure 16.15Figure 16.16Figure 16.17Figure 16.18Figure 16.19Figure 16.20
Chapter 17: Symmetric Weights and Deep Belief Networks
17.1 Energetic Learning: The Hopfield Network17.1.1 Associative Memory17.1.2 Making an Associative Memory17.1.3 An Energy Function17.1.4 Capacity of the Hopfield Network17.1.5 The Continuous Hopfield Network17.2 Stochastic Neurons — The Boltzmann Machine17.2.1 The Restricted Boltzmann Machine17.2.2 Deriving the CD Algorithm17.2.3 Supervised Learning17.2.4 The RBM as a Directed Belief Network17.3 Deep Learning17.3.1 Deep Belief Networks (DBN)Further ReadingPractice QuestionsFigure 17.1Figure 17.2Figure 17.3Figure 17.4Figure 17.5Figure 17.6Figure 17.7Figure 17.8Figure 17.9Figure 17.10Figure 17.11Figure 17.12Figure 17.13Figure 17.14
Chapter 18: Gaussian Processes
18.1 Gaussian Process Regression18.1.1 Adding Noise18.1.2 Implementation18.1.3 Learning the Parameters18.1.4 Implementation18.1.5 Choosing a (set of) Covariance Functions18.2 Gaussian Process Classification18.2.1 The Laplace Approximation18.2.2 Computing the Posterior18.2.3 ImplementationFurther ReadingPractice QuestionsFigure 18.1Figure 18.2Figure 18.3Figure 18.4Figure 18.5Figure 18.6Figure 18.7Figure 18.8Figure 18.9
Appendix A: Python
A.1 Installing Python and Other PackagesA.2 Getting StartedA.2.1 Python for MATLAB® and R usersA.3 Code BasicsA.3.1 Writing and Importing CodeA.3.2 Control FlowA.3.3 FunctionsA.3.4 The doc StringA.3.5 map and lambdaA.3.6 ExceptionsA.3.7 ClassesA.4 Using Numpy And MatplotlibA.4.1 ArraysA.4.2 Random NumbersA.4.3 Linear AlgebraA.4.4 PlottingA.4.5 One Thing to Be Aware ofFurther ReadingPractice QuestionsFigure A.1

Content preview from Machine Learning, 2nd Edition

Chapter 4

The Multi-Layer Perceptron

In the last chapter we saw that while linear models are easy to understand and use, they come with the inherent cost that is implied by the word ‘linear’; that is, they can only identify straight lines, planes, or hyperplanes. And this is not usually enough, because the majority of interesting problems are not linearly separable. In Section 3.4 we saw that problems can be made linearly separable if we can work out how to transform the features suitably. We will come back to this idea in Chapter 8, but in this chapter we will instead consider making more complicated networks.

We have pretty much decided that the learning in the neural network happens in the weights. So, to perform more computation it seems ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781466583283

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Machine Learning, 2nd Edition

by Stephen Marsland

The Multi-Layer Perceptron

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.