book

Machine Learning, 2nd Edition

Name: Machine Learning, 2nd Edition
Author: Steven W. Knox
ISBN: 9781394325252

by Steven W. Knox

March 2026

Intermediate

432 pages

21h 5m

English

Wiley

Read now

Unlock full access

Cover
Table of Contents
Title Page
Copyright
Preface
Note
Organization — How to Use This Book
Notes
Acknowledgments
About the Companion Website
Chapter 1: Introduction – Examples from Real Life
Note
Chapter 2: The Problem of Learning
2.1 Domain2.2 Range2.3 Data2.4 Loss2.5 Risk2.6 The Reality of the Unknown Function2.7 Training and Selection of Models2.8 Purposes of Learning2.9 NotationNotes

Chapter 3: Regression
3.1 General Framework3.2 Loss3.3 Estimating the Model Parameters3.4 Properties of Fitted Values3.5 Estimating the Variance3.6 A Normality Assumption3.7 Computation3.8 Categorical Features3.9 Feature Expansions, Interactions, and Transformations3.10 Penalized Regression: Model Transformation for Risk Reduction3.11 Variations in Linear Regression3.12 Nonlinear Regression3.13 Nonparametric RegressionNotes
Chapter 4: Classification
4.1 The Bayes Classifier4.2 Introduction to Classifiers4.3 Mitigating Biases in Software, Biases in Data, and Zero Probabilities4.4 Class Boundaries4.5 A Running Example4.6 Likelihood Methods4.7 Prototype Methods4.8 Logistic Regression4.9 Neural Networks4.10 Classification Trees4.11 Support Vector Machines4.12 Postscript: Example Problem RevisitedNotes
Chapter 5: Bias-Variance Trade-Off
5.1 Squared-Error Loss5.2 General LossNotes
Chapter 6: Combining Classifiers
6.1 Ensembles6.2 Ensemble Design6.3 Bootstrap Aggregation (Bagging)6.4 Random Forests6.5 Boosting and Arcing6.6 Classification by Regression Ensemble6.7 Gradient Boosting6.8 Stacking and Mixture of Experts6.9 Postscript: Example Problem RevisitedNotes
Chapter 7: Risk Estimation and Model Selection
7.1 Risk Estimation via Training Data7.2 Risk Estimation via Validation or Test Data7.3 Cross-Validation7.4 Improvements on Cross-Validation7.5 Out-of-Bag Risk Estimation7.6 Akaike’s Information Criterion7.7 Schwartz’s Bayesian Information Criterion7.8 Rissanen’s Minimum Description Length Criterion7.9 R2 and Adjusted R27.10 Stepwise Model Selection7.11 Occam’s Razor7.12 Size of Validation and Test Data SetsNotes
Chapter 8: Consistency
8.1 Convergence of Sequences of Random Variables8.2 Consistency for Parameter Estimation8.3 Consistency for Prediction8.4 There Are Consistent and Universally Consistent Classifiers8.5 Convergence to Asymptopia Is Not Uniform and May Be SlowNotes
Chapter 9: Clustering
9.1 Gaussian Mixture Models9.2 k-Means9.3 Clustering by Mode-Hunting in a Density Estimate9.4 Using Classifiers to Cluster9.5 Dissimilarity9.6 k-Medoids9.7 k-Modes and k-Prototypes9.8 Agglomerative Hierarchical Clustering9.9 Divisive Hierarchical Clustering9.10 How Many Clusters Are There? Interpretation of Clustering9.11 An Impossibility TheoremNotes
Chapter 10: Optimization
10.1 Quasi-Newton Methods10.2 The Nelder–Mead Algorithm10.3 Simulated Annealing10.4 Genetic Algorithms10.5 Particle Swarm Optimization10.6 General Remarks on Optimization10.7 Solving Least-Squares Problems via Quasi-Newton Methods10.8 Gradient Computation for Neural Networks via Backpropagation10.9 Handling Missing Data via the Expectation-Maximization Algorithm10.10 Fitting Support Vector Machines via Sequential Minimal OptimizationNotes
Chapter 11: High-Dimensional Data
11.1 The Curse of Dimensionality11.2 Two Running Examples11.3 Reducing Dimension While Preserving Information11.4 Model RegularizationNotes
Chapter 12: Communication with Clients
12.1 Binary Classification and Hypothesis Testing12.2 Terminology for Binary Decisions12.3 Receiver Operating Characteristic (ROC) Curves12.4 One-Dimensional Measures of Performance12.5 Confusion Matrices12.6 Pairwise Model Comparison12.7 Multiple Testing12.8 Expert Systems12.9 Ethics in Machine LearningNotes
Chapter 13: Current Challenges in Machine Learning
13.1 Streaming Data13.2 Distributed Data13.3 Semi-Supervised Learning13.4 Active Learning13.5 Feature Construction via Deep Neural Networks13.6 Transfer Learning13.7 Interpretability and Protection of Complex Models
Chapter 14: R and Python Source Code
14.1 Author’s Biases14.2 Packages and Code14.3 The Running Example (Section 4.5)14.4 The Bayes Classifier (Section 4.1)14.5 Quadratic Discriminant Analysis (Section 4.6.1)14.6 Linear Discriminant Analysis (Section 4.6.2)14.7 Gaussian Mixture Models (Section 4.6.3)14.8 Kernel Density Estimation (Section 4.6.4)14.9 Histograms (Section 4.6.5)14.10 The Naive Bayes Classifier (Section 4.6.6)14.11 k-Nearest-Neighbor (Section 4.7.1)14.12 Learning Vector Quantization (Section 4.7.4)14.13 Logistic Regression (Section 4.8)14.14 Neural Networks (Section 4.9)14.15 Classification Trees (Section 4.10)14.16 Support Vector Machines (Section 4.11)14.17 Bootstrap Aggregation (Bagging) (Section 6.3)14.18 Random Forests (Section 6.4)14.19 Boosting by Reweighting (Section 6.5)14.20 Boosting by Sampling (Arcing) (Section 6.5)14.21 Gradient Boosted Trees (Section 6.7)Notes
Appendix A: List of Symbols
Appendix B: The Condition Number of a Matrix with Respect to a Norm
Condition Number with Respect to a General NormCondition Number with Respect to the Euclidean NormCondition Number Application in Least-Squares Linear RegressionNotes
Appendix C: Converting Between Normal Parameters and Level-Curve Ellipsoids
Parameters to AxesAxes to ParametersNote
Appendix D: The Geometry of Linear Functions and Linear Classifiers
Linear Functions and HyperplanesLogistic Regression and Neural Networks (Sections 4.8 and 4.9)Support Vector Machines (Section 4.11)Linear Discriminant Analysis (Section 4.6.2)Notes
Appendix E: Training Data and Fitted Parameters
Training DataFitted Model Parameters
Appendix F: Solutions to Selected Exercises
Note
Bibliography
Index
End User License Agreement

Content preview from Machine Learning, 2nd Edition

Chapter 5Bias-Variance Trade-Off

A machine [classifier] with too much capacity [ability to fit training data exactly] is like a botanist with a photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything she has seen before; a machine with too little capacity is like the botanist’s lazy brother, who declares that if it’s green, then it’s a tree. Neither can generalize well.

— Christopher J. C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, 1998

Recall from Chapter 2 that an approximation method is a function that maps a training dataset to an approximation ModifyingAbove f With caret Subscript upper S , and the risk of an approximation method is the expected loss with respect to the distribution of new data and of training datasets,

The risk of an approximation method decomposes in an informative way when squared-error loss is used. Specifically, under square-error loss, risk decomposes into a sum of three nonnegative terms, one of which we can do nothing about and two of which we can affect. As we shall see in Chapter 6, viewing risk-minimization as minimization of the sum of two nonnegative terms, and having useful ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781394325252

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Machine Learning, 2nd Edition

by Steven W. Knox

Chapter 5Bias-Variance Trade-Off

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.