book

Machine Learning, 2nd Edition

Name: Machine Learning, 2nd Edition
Author: Steven W. Knox
ISBN: 9781394325252

by Steven W. Knox

March 2026

Intermediate

432 pages

21h 5m

English

Wiley

Read now

Unlock full access

Cover
Table of Contents
Title Page
Copyright
Preface
Note
Organization — How to Use This Book
Notes
Acknowledgments
About the Companion Website
Chapter 1: Introduction – Examples from Real Life
Note
Chapter 2: The Problem of Learning
2.1 Domain2.2 Range2.3 Data2.4 Loss2.5 Risk2.6 The Reality of the Unknown Function2.7 Training and Selection of Models2.8 Purposes of Learning2.9 NotationNotes

Chapter 3: Regression
3.1 General Framework3.2 Loss3.3 Estimating the Model Parameters3.4 Properties of Fitted Values3.5 Estimating the Variance3.6 A Normality Assumption3.7 Computation3.8 Categorical Features3.9 Feature Expansions, Interactions, and Transformations3.10 Penalized Regression: Model Transformation for Risk Reduction3.11 Variations in Linear Regression3.12 Nonlinear Regression3.13 Nonparametric RegressionNotes
Chapter 4: Classification
4.1 The Bayes Classifier4.2 Introduction to Classifiers4.3 Mitigating Biases in Software, Biases in Data, and Zero Probabilities4.4 Class Boundaries4.5 A Running Example4.6 Likelihood Methods4.7 Prototype Methods4.8 Logistic Regression4.9 Neural Networks4.10 Classification Trees4.11 Support Vector Machines4.12 Postscript: Example Problem RevisitedNotes
Chapter 5: Bias-Variance Trade-Off
5.1 Squared-Error Loss5.2 General LossNotes
Chapter 6: Combining Classifiers
6.1 Ensembles6.2 Ensemble Design6.3 Bootstrap Aggregation (Bagging)6.4 Random Forests6.5 Boosting and Arcing6.6 Classification by Regression Ensemble6.7 Gradient Boosting6.8 Stacking and Mixture of Experts6.9 Postscript: Example Problem RevisitedNotes
Chapter 7: Risk Estimation and Model Selection
7.1 Risk Estimation via Training Data7.2 Risk Estimation via Validation or Test Data7.3 Cross-Validation7.4 Improvements on Cross-Validation7.5 Out-of-Bag Risk Estimation7.6 Akaike’s Information Criterion7.7 Schwartz’s Bayesian Information Criterion7.8 Rissanen’s Minimum Description Length Criterion7.9 R2 and Adjusted R27.10 Stepwise Model Selection7.11 Occam’s Razor7.12 Size of Validation and Test Data SetsNotes
Chapter 8: Consistency
8.1 Convergence of Sequences of Random Variables8.2 Consistency for Parameter Estimation8.3 Consistency for Prediction8.4 There Are Consistent and Universally Consistent Classifiers8.5 Convergence to Asymptopia Is Not Uniform and May Be SlowNotes
Chapter 9: Clustering
9.1 Gaussian Mixture Models9.2 k-Means9.3 Clustering by Mode-Hunting in a Density Estimate9.4 Using Classifiers to Cluster9.5 Dissimilarity9.6 k-Medoids9.7 k-Modes and k-Prototypes9.8 Agglomerative Hierarchical Clustering9.9 Divisive Hierarchical Clustering9.10 How Many Clusters Are There? Interpretation of Clustering9.11 An Impossibility TheoremNotes
Chapter 10: Optimization
10.1 Quasi-Newton Methods10.2 The Nelder–Mead Algorithm10.3 Simulated Annealing10.4 Genetic Algorithms10.5 Particle Swarm Optimization10.6 General Remarks on Optimization10.7 Solving Least-Squares Problems via Quasi-Newton Methods10.8 Gradient Computation for Neural Networks via Backpropagation10.9 Handling Missing Data via the Expectation-Maximization Algorithm10.10 Fitting Support Vector Machines via Sequential Minimal OptimizationNotes
Chapter 11: High-Dimensional Data
11.1 The Curse of Dimensionality11.2 Two Running Examples11.3 Reducing Dimension While Preserving Information11.4 Model RegularizationNotes
Chapter 12: Communication with Clients
12.1 Binary Classification and Hypothesis Testing12.2 Terminology for Binary Decisions12.3 Receiver Operating Characteristic (ROC) Curves12.4 One-Dimensional Measures of Performance12.5 Confusion Matrices12.6 Pairwise Model Comparison12.7 Multiple Testing12.8 Expert Systems12.9 Ethics in Machine LearningNotes
Chapter 13: Current Challenges in Machine Learning
13.1 Streaming Data13.2 Distributed Data13.3 Semi-Supervised Learning13.4 Active Learning13.5 Feature Construction via Deep Neural Networks13.6 Transfer Learning13.7 Interpretability and Protection of Complex Models
Chapter 14: R and Python Source Code
14.1 Author’s Biases14.2 Packages and Code14.3 The Running Example (Section 4.5)14.4 The Bayes Classifier (Section 4.1)14.5 Quadratic Discriminant Analysis (Section 4.6.1)14.6 Linear Discriminant Analysis (Section 4.6.2)14.7 Gaussian Mixture Models (Section 4.6.3)14.8 Kernel Density Estimation (Section 4.6.4)14.9 Histograms (Section 4.6.5)14.10 The Naive Bayes Classifier (Section 4.6.6)14.11 k-Nearest-Neighbor (Section 4.7.1)14.12 Learning Vector Quantization (Section 4.7.4)14.13 Logistic Regression (Section 4.8)14.14 Neural Networks (Section 4.9)14.15 Classification Trees (Section 4.10)14.16 Support Vector Machines (Section 4.11)14.17 Bootstrap Aggregation (Bagging) (Section 6.3)14.18 Random Forests (Section 6.4)14.19 Boosting by Reweighting (Section 6.5)14.20 Boosting by Sampling (Arcing) (Section 6.5)14.21 Gradient Boosted Trees (Section 6.7)Notes
Appendix A: List of Symbols
Appendix B: The Condition Number of a Matrix with Respect to a Norm
Condition Number with Respect to a General NormCondition Number with Respect to the Euclidean NormCondition Number Application in Least-Squares Linear RegressionNotes
Appendix C: Converting Between Normal Parameters and Level-Curve Ellipsoids
Parameters to AxesAxes to ParametersNote
Appendix D: The Geometry of Linear Functions and Linear Classifiers
Linear Functions and HyperplanesLogistic Regression and Neural Networks (Sections 4.8 and 4.9)Support Vector Machines (Section 4.11)Linear Discriminant Analysis (Section 4.6.2)Notes
Appendix E: Training Data and Fitted Parameters
Training DataFitted Model Parameters
Appendix F: Solutions to Selected Exercises
Note
Bibliography
Index
End User License Agreement

Content preview from Machine Learning, 2nd Edition

Preface

The goal of statistical data analysis is to extract the maximum information from the data, and to present a product that is as accurate and as useful as possible.

— David W. Scott, Multivariate Density Estimation: Theory, Practice and Visualization, 1992

My purpose in writing this book is to introduce the mathematically sophisticated reader to a large number of topics and techniques in the field variously known as machine learning, statistical learning, or predictive modeling. I believe that a deeper understanding of the subject as a whole will be obtained from reflection on an intuitive understanding of many techniques rather than a very detailed understanding of only one or two, and the book is structured accordingly. I have omitted many details while focusing on what I think shows “what is really going on.” For details, the reader will be directed to the relevant literature or to the exercises, which form an integral part of the text.

No work this small on a subject this large can be self-contained. Some undergraduate-level calculus, linear algebra, and probability is assumed without reference, as are a few basic ideas from statistics. All of the techniques discussed here can, I hope, be implemented using this book and a mid-level programming language (such as C), and explicit implementation of many techniques using both R and Python is presented in the last chapter.

The reader may detect a coverage bias in favor of classification over regression. This is deliberate. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781394325252

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Machine Learning, 2nd Edition

by Steven W. Knox

Preface

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.