book

Machine Learning

Name: Machine Learning
Author: Sergios Theodoridis
ISBN: 9780128017227

by Sergios Theodoridis

April 2015

Intermediate to advanced

1062 pages

40h 35m

English

Academic Press

Read now

Unlock full access

Cover image
Title page
Table of Contents
Copyright
Preface
Acknowledgments
Notation
Dedication
Chapter 1: Introduction
Abstract1.1 What Machine Learning is About1.2 Structure and a Road Map of the Book
Chapter 2: Probability and Stochastic Processes
Abstract2.1 Introduction2.2 Probability and Random Variables2.3 Examples of Distributions2.4 Stochastic Processes

2.5 Information Theory
2.6 Stochastic ConvergenceProblems
Chapter 3: Learning in Parametric Modeling: Basic Concepts and Directions
Abstract3.1 Introduction3.2 Parameter Estimation: The Deterministic Point of View3.3 Linear Regression3.4 Classification3.5 Biased Versus Unbiased Estimation3.6 The Cramér-Rao Lower Bound3.7 Sufficient Statistic3.8 Regularization3.9 The Bias-Variance Dilemma
3.10 Maximum Likelihood Method
3.11 Bayesian Inference3.12 Curse of Dimensionality3.13 Validation3.14 Expected and Empirical Loss Functions3.15 Nonparametric Modeling and EstimationProblems
Chapter 4: Mean-Square Error Linear Estimation
Abstract4.1 Introduction4.2 Mean-Square Error Linear Estimation: The Normal Equations
Chapter 5: Stochastic Gradient Descent: The LMS Algorithm and its Family
Abstract5.1 Introduction5.2 The Steepest Descent Method5.3 Application to the Mean-Square Error Cost Function5.4 Stochastic Approximation5.5 The Least-Mean-Squares Adaptive Algorithm
5.6 The Affine Projection Algorithm
5.7 The Complex-Valued Case5.8 Relatives of the LMS5.9 Simulation Examples5.10 Adaptive Decision Feedback Equalization5.11 The Linearly Constrained LMS5.12 Tracking Performance of the LMS in Nonstationary Environments5.13 Distributed Learning: The Distributed LMS
5.14 A Case Study: Target Localization
5.15 Some Concluding Remarks: Consensus MatrixProblemsMATLAB Exercises
Chapter 6: The Least-Squares Family
Abstract6.1 Introduction6.2 Least-Squares Linear Regression: A Geometric Perspective6.3 Statistical Properties of the LS Estimator6.4 Orthogonalizing the Column Space of X: The SVD Method6.5 Ridge Regression6.6 The Recursive Least-Squares Algorithm6.7 Newton’s Iterative Minimization Method6.8 Steady-State Performance of the RLS6.9 Complex-Valued Data: The Widely Linear RLS6.10 Computational Aspects of the LS Solution
6.11 The Coordinate and Cyclic Coordinate Descent Methods
6.12 Simulation Examples6.13 Total-Least-SquaresProblems
Chapter 7: Classification: A Tour of the Classics
Abstract7.1 Introduction7.2 Bayesian Classification7.3 Decision (Hyper)Surfaces7.4 The Naive Bayes Classifier7.5 The Nearest Neighbor Rule7.6 Logistic Regression7.7 Fisher’s Linear Discriminant7.8 Classification Trees7.9 Combining Classifiers
7.10 The Boosting Approach
7.11 Boosting Trees7.12 A Case Study: Protein Folding PredictionProblems
Chapter 8: Parameter Learning: A Convex Analytic Path
Abstract8.1 Introduction8.2 Convex Sets and Functions8.3 Projections onto Convex Sets8.4 Fundamental Theorem of Projections onto Convex Sets8.5 A Parallel Version of POCS8.6 From Convex Sets to Parameter Estimation and Machine Learning8.7 Infinite Many Closed Convex Sets: The Online Learning Case8.8 Constrained Learning8.9 The Distributed APSM
8.10 Optimizing Nonsmooth Convex Cost Functions
8.11 Regret Analysis8.12 Online Learning and Big Data Applications: A Discussion
8.13 Proximal Operators
8.14 Proximal Splitting Methods for OptimizationProblemsMATLAB Exercises8.15 Appendix to Chapter 8
Chapter 9: Sparsity-Aware Learning: Concepts and Theoretical Foundations
Abstract9.1 Introduction9.2 Searching for a Norm9.3 The Least Absolute Shrinkage and Selection Operator (LASSO)9.4 Sparse Signal Representation9.5 In Search of the Sparsest Solution9.6 Uniqueness of the ℓ0 Minimizer9.7 Equivalence of ℓ0 and ℓ1 Minimizers: Sufficiency Conditions9.8 Robust Sparse Signal Recovery from Noisy Measurements9.9 Compressed Sensing: The Glory of Randomness9.10 A Case Study: Image De-Noising
Problems
Chapter 10: Sparsity-Aware Learning: Algorithms and Applications
Abstract10.1 Introduction10.2 Sparsity-Promoting Algorithms10.3 Variations on the Sparsity-Aware Theme10.4 Online Sparsity-Promoting Algorithms
10.5 Learning Sparse Analysis Models
10.6 A Case Study: Time-Frequency Analysis10.7 Appendix to Chapter 10: Some Hints from the Theory of FramesProblems
Chapter 11: Learning in Reproducing Kernel Hilbert Spaces
Abstract11.1 Introduction11.2 Generalized Linear Models11.3 Volterra, Wiener, and Hammerstein Models11.4 Cover’s Theorem: Capacity of a Space in Linear Dichotomies11.5 Reproducing Kernel Hilbert Spaces11.6 Representer Theorem11.7 Kernel Ridge Regression11.8 Support Vector Regression
11.9 Kernel Ridge Regression Revisited
11.10 Optimal Margin Classification: Support Vector Machines11.11 Computational Considerations11.12 Online Learning in RKHS
11.13 Multiple Kernel Learning
11.14 Nonparametric Sparsity-Aware Learning: Additive Models11.15 A Case Study: Authorship IdentificationProblems
Chapter 12: Bayesian Learning: Inference and the EM Algorithm
Abstract12.1 Introduction12.2 Regression: A Bayesian Perspective12.3 The Evidence Function and Occam’s Razor Rule12.4 Exponential Family of Probability Distributions
12.5 Latent Variables and the EM Algorithm
12.6 Linear Regression and the EM Algorithm12.7 Gaussian Mixture Models12.8 Combining Learning Models: A Probabilistic Point of ViewProblemsMATLAB Exercises
12.9 Appendix to Chapter 12
Chapter 13: Bayesian Learning: Approximate Inference and Nonparametric Models
Abstract13.1 Introduction13.2 Variational Approximation in Bayesian Learning13.3 A Variational Bayesian Approach to Linear Regression13.4 A Variational Bayesian Approach to Gaussian Mixture Modeling13.5 When Bayesian Inference Meets Sparsity
13.6 Sparse Bayesian Learning (SBL)
13.7 The Relevance Vector Machine Framework13.8 Convex Duality and Variational Bounds13.9 Sparsity-Aware Regression: A Variational Bound Bayesian Path13.10 Sparsity-Aware Learning: Some Concluding Remarks13.11 Expectation Propagation
13.12 Nonparametric Bayesian Modeling
13.13 Gaussian Processes13.14 A Case Study: Hyperspectral Image UnmixingProblems
Chapter 14: Monte Carlo Methods
Abstract14.1 Introduction14.2 Monte Carlo Methods: The Main Concept14.3 Random Sampling Based on Function Transformation14.4 Rejection Sampling14.5 Importance Sampling14.6 Monte Carlo Methods and the EM Algorithm14.7 Markov Chain Monte Carlo Methods14.8 The Metropolis Method14.9 Gibbs Sampling14.10 In Search of More Efficient Methods: A Discussion
14.11 A Case Study: Change-Point Detection
Problems
Chapter 15: Probabilistic Graphical Models: Part I
Abstract15.1 Introduction15.2 The Need for Graphical Models15.3 Bayesian Networks and the Markov Condition15.4 Undirected Graphical Models15.5 Factor Graphs15.6 Moralization of Directed Graphs15.7 Exact Inference Methods: Message-Passing Algorithms
Problems
Chapter 16: Probabilistic Graphical Models: Part II
Abstract16.1 Introduction16.2 Triangulated Graphs and Junction Trees16.3 Approximate Inference Methods16.4 Dynamic Graphical Models16.5 Hidden Markov Models
16.6 Beyond HMMs: A Discussion
16.7 Learning Graphical ModelsProblems
Chapter 17: Particle Filtering
Abstract17.1 Introduction17.2 Sequential Importance Sampling17.3 Kalman and Particle Filtering17.4 Particle Filtering
Problems
Chapter 18: Neural Networks and Deep Learning
Abstract18.1 Introduction18.2 The Perceptron18.3 Feed-Forward Multilayer Neural Networks18.4 The Backpropagation Algorithm18.5 Pruning the Network18.6 Universal Approximation Property of Feed-Forward Neural Networks18.7 Neural Networks: A Bayesian Flavor18.8 Learning Deep Networks
18.9 Deep Belief Networks
18.10 Variations on the Deep Learning Theme18.11 Case Study: A Deep Network for Optical Character Recognition18.12 CASE Study: A Deep Autoencoder18.13 Example: Generating Data via a DBNProblemsMATLAB Exercises
Chapter 19: Dimensionality Reduction and Latent Variables Modeling
Abstract19.1 Introduction19.2 Intrinsic Dimensionality19.3 Principle Component Analysis19.4 Canonical Correlation Analysis19.5 Independent Component Analysis19.6 Dictionary Learning: The k-SVD Algorithm
19.7 Nonnegative Matrix Factorization
19.8 Learning Low-Dimensional Models: A Probabilistic Perspective19.9 Nonlinear Dimensionality Reduction19.10 Low-Rank Matrix Factorization: A Sparse Modeling Path19.11 A Case Study: fMRI Data AnalysisProblems
Appendix A: Linear Algebra
A.1 Properties of MatricesA.2 Positive Definite and Symmetric MatricesA.3 Wirtinger Calculus
Appendix B: Probability Theory and Statistics
B.1 Cramér-Rao BoundB.2 Characteristic FunctionsB.3 Moments and CumulantsB.4 Edgeworth Expansion of a pdf
Appendix C: Hints on Constrained Optimization
C.1 Equality ConstraintsC.2 Inequality Constrains
Index

Overview

This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches -which are based on optimization techniques – together with the Bayesian inference approach, whose essence lies in the use of a hierarchy of probabilistic models. The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts.

The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as short courses on sparse modeling, deep learning, and probabilistic graphical models.

All major classical techniques: Mean/Least-Squares regression and filtering, Kalman filtering, stochastic approximation and online learning, Bayesian classification, decision trees, logistic regression and boosting methods.
The latest trends: Sparsity, convex analysis and optimization, online distributed algorithms, learning in RKH spaces, Bayesian inference, graphical and hidden Markov models, particle filtering, deep learning, dictionary learning and latent variables modeling.
Case studies - protein folding prediction, optical character recognition, text authorship identification, fMRI data analysis, change point detection, hyperspectral image unmixing, target localization, channel equalization and echo cancellation, show how the theory can be applied.
MATLAB code for all the main algorithms are available on an accompanying website, enabling the reader to experiment with the code.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9780128015223

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills