Essential Math for AI

Book description

Companies are scrambling to integrate AI into their systems and operations. But to build truly successful solutions, you need a firm grasp of the underlying mathematics. This accessible guide walks you through the math necessary to thrive in the AI field such as focusing on real-world applications rather than dense academic theory.

Engineers, data scientists, and students alike will examine mathematical topics critical for AI--including regression, neural networks, optimization, backpropagation, convolution, Markov chains, and more--through popular applications such as computer vision, natural language processing, and automated systems. And supplementary Jupyter notebooks shed light on examples with Python code and visualizations. Whether you're just beginning your career or have years of experience, this book gives you the foundation necessary to dive deeper in the field.

  • Understand the underlying mathematics powering AI systems, including generative adversarial networks, random graphs, large random matrices, mathematical logic, optimal control, and more
  • Learn how to adapt mathematical methods to different applications from completely different fields
  • Gain the mathematical fluency to interpret and explain how AI systems arrive at their decisions

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Why I Wrote This Book
    2. Who Is This Book For?
    3. Who Is This Book Not For?
    4. How Will the Math Be Presented in This Book?
    5. Infographic
    6. What Math Background Is Expected from You to Be Able to Read This Book?
    7. Overview of the Chapters
    8. My Favorite Books on AI
    9. Conventions Used in This Book
    10. Using Code Examples
    11. O’Reilly Online Learning
    12. How to Contact Us
    13. Acknowledgments
  2. 1. Why Learn the Mathematics of AI?
    1. What Is AI?
    2. Why Is AI So Popular Now?
    3. What Is AI Able to Do?
      1. An AI Agent’s Specific Tasks
    4. What Are AI’s Limitations?
    5. What Happens When AI Systems Fail?
    6. Where Is AI Headed?
    7. Who Are the Current Main Contributors to the AI Field?
    8. What Math Is Typically Involved in AI?
    9. Summary and Looking Ahead
  3. 2. Data, Data, Data
    1. Data for AI
    2. Real Data Versus Simulated Data
    3. Mathematical Models: Linear Versus Nonlinear
    4. An Example of Real Data
    5. An Example of Simulated Data
    6. Mathematical Models: Simulations and AI
    7. Where Do We Get Our Data From?
    8. The Vocabulary of Data Distributions, Probability, and Statistics
      1. Random Variables
      2. Probability Distributions
      3. Marginal Probabilities
      4. The Uniform and the Normal Distributions
      5. Conditional Probabilities and Bayes’ Theorem
      6. Conditional Probabilities and Joint Distributions
      7. Prior Distribution, Posterior Distribution, and Likelihood Function
      8. Mixtures of Distributions
      9. Sums and Products of Random Variables
      10. Using Graphs to Represent Joint Probability Distributions
      11. Expectation, Mean, Variance, and Uncertainty
      12. Covariance and Correlation
      13. Markov Process
      14. Normalizing, Scaling, and/or Standardizing a Random Variable or Data Set
      15. Common Examples
    9. Continuous Distributions Versus Discrete Distributions (Density Versus Mass)
    10. The Power of the Joint Probability Density Function
    11. Distribution of Data: The Uniform Distribution
    12. Distribution of Data: The Bell-Shaped Normal (Gaussian) Distribution
    13. Distribution of Data: Other Important and Commonly Used Distributions
    14. The Various Uses of the Word “Distribution”
    15. A/B Testing
    16. Summary and Looking Ahead
  4. 3. Fitting Functions to Data
    1. Traditional and Very Useful Machine Learning Models
    2. Numerical Solutions Versus Analytical Solutions
    3. Regression: Predict a Numerical Value
      1. Training Function
      2. Loss Function
      3. Optimization
    4. Logistic Regression: Classify into Two Classes
      1. Training Function
      2. Loss Function
      3. Optimization
    5. Softmax Regression: Classify into Multiple Classes
      1. Training Function
      2. Loss Function
      3. Optimization
    6. Incorporating These Models into the Last Layer of a Neural Network
    7. Other Popular Machine Learning Techniques and Ensembles of Techniques
      1. Support Vector Machines
      2. Decision Trees
      3. Random Forests
      4. k-means Clustering
    8. Performance Measures for Classification Models
    9. Summary and Looking Ahead
  5. 4. Optimization for Neural Networks
    1. The Brain Cortex and Artificial Neural Networks
    2. Training Function: Fully Connected, or Dense, Feed Forward Neural Networks
      1. A Neural Network Is a Computational Graph Representation of the Training Function
      2. Linearly Combine, Add Bias, Then Activate
      3. Common Activation Functions
      4. Universal Function Approximation
      5. Approximation Theory for Deep Learning
    3. Loss Functions
    4. Optimization
      1. Mathematics and the Mysterious Success of Neural Networks
      2. Gradient Descent ω → i+1 = ω → i - η ∇ L ( ω → i )
      3. Explaining the Role of the Learning Rate Hyperparameter η
      4. Convex Versus Nonconvex Landscapes
      5. Stochastic Gradient Descent
      6. Initializing the Weights ω → 0 for the Optimization Process
    5. Regularization Techniques
      1. Dropout
      2. Early Stopping
      3. Batch Normalization of Each Layer
      4. Control the Size of the Weights by Penalizing Their Norm
      5. Penalizing the l 2 Norm Versus Penalizing the l 1 Norm
      6. Explaining the Role of the Regularization Hyperparameter α
    6. Hyperparameter Examples That Appear in Machine Learning
    7. Chain Rule and Backpropagation: Calculating ∇ L ( ω → i )
      1. Backpropagation Is Not Too Different from How Our Brain Learns
      2. Why Is It Better to Backpropagate?
      3. Backpropagation in Detail
    8. Assessing the Significance of the Input Data Features
    9. Summary and Looking Ahead
  6. 5. Convolutional Neural Networks and Computer Vision
    1. Convolution and Cross-Correlation
      1. Translation Invariance and Translation Equivariance
      2. Convolution in Usual Space Is a Product in Frequency Space
    2. Convolution from a Systems Design Perspective
      1. Convolution and Impulse Response for Linear and Translation Invariant Systems
    3. Convolution and One-Dimensional Discrete Signals
    4. Convolution and Two-Dimensional Discrete Signals
      1. Filtering Images
      2. Feature Maps
    5. Linear Algebra Notation
      1. The One-Dimensional Case: Multiplication by a Toeplitz Matrix
      2. The Two-Dimensional Case: Multiplication by a Doubly Block Circulant Matrix
    6. Pooling
    7. A Convolutional Neural Network for Image Classification
    8. Summary and Looking Ahead
  7. 6. Singular Value Decomposition: Image Processing, Natural Language Processing, and Social Media
    1. Matrix Factorization
    2. Diagonal Matrices
    3. Matrices as Linear Transformations Acting on Space
      1. Action of A on the Right Singular Vectors
      2. Action of A on the Standard Unit Vectors and the Unit Square Determined by Them
      3. Action of A on the Unit Circle
      4. Breaking Down the Circle-to-Ellipse Transformation According to the Singular Value Decomposition
      5. Rotation and Reflection Matrices
      6. Action of A on a General Vector x →
    4. Three Ways to Multiply Matrices
    5. The Big Picture
      1. The Condition Number and Computational Stability
    6. The Ingredients of the Singular Value Decomposition
    7. Singular Value Decomposition Versus the Eigenvalue Decomposition
    8. Computation of the Singular Value Decomposition
      1. Computing an Eigenvector Numerically
    9. The Pseudoinverse
    10. Applying the Singular Value Decomposition to Images
    11. Principal Component Analysis and Dimension Reduction
    12. Principal Component Analysis and Clustering
    13. A Social Media Application
    14. Latent Semantic Analysis
    15. Randomized Singular Value Decomposition
    16. Summary and Looking Ahead
  8. 7. Natural Language and Finance AI: Vectorization and Time Series
    1. Natural Language AI
    2. Preparing Natural Language Data for Machine Processing
    3. Statistical Models and the log Function
    4. Zipf’s Law for Term Counts
    5. Various Vector Representations for Natural Language Documents
      1. Term Frequency Vector Representation of a Document or Bag of Words
      2. Term Frequency-Inverse Document Frequency Vector Representation of a Document
      3. Topic Vector Representation of a Document Determined by Latent Semantic Analysis
      4. Topic Vector Representation of a Document Determined by Latent Dirichlet Allocation
      5. Topic Vector Representation of a Document Determined by Latent Discriminant Analysis
      6. Meaning Vector Representations of Words and of Documents Determined by Neural Network Embeddings
    6. Cosine Similarity
    7. Natural Language Processing Applications
      1. Sentiment Analysis
      2. Spam Filter
      3. Search and Information Retrieval
      4. Machine Translation
      5. Image Captioning
      6. Chatbots
      7. Other Applications
    8. Transformers and Attention Models
      1. The Transformer Architecture
      2. The Attention Mechanism
      3. Transformers Are Far from Perfect
    9. Convolutional Neural Networks for Time Series Data
    10. Recurrent Neural Networks for Time Series Data
      1. How Do Recurrent Neural Networks Work?
      2. Gated Recurrent Units and Long Short-Term Memory Units
    11. An Example of Natural Language Data
    12. Finance AI
    13. Summary and Looking Ahead
  9. 8. Probabilistic Generative Models
    1. What Are Generative Models Useful For?
    2. The Typical Mathematics of Generative Models
    3. Shifting Our Brain from Deterministic Thinking to Probabilistic Thinking
    4. Maximum Likelihood Estimation
    5. Explicit and Implicit Density Models
    6. Explicit Density-Tractable: Fully Visible Belief Networks
      1. Example: Generating Images via PixelCNN and Machine Audio via WaveNet
    7. Explicit Density-Tractable: Change of Variables Nonlinear Independent Component Analysis
    8. Explicit Density-Intractable: Variational Autoencoders Approximation via Variational Methods
    9. Explicit Density-Intractable: Boltzman Machine Approximation via Markov Chain
    10. Implicit Density-Markov Chain: Generative Stochastic Network
    11. Implicit Density-Direct: Generative Adversarial Networks
      1. How Do Generative Adversarial Networks Work?
    12. Example: Machine Learning and Generative Networks for High Energy Physics
    13. Other Generative Models
      1. Naive Bayes Classification Model
      2. Gaussian Mixture Model
    14. The Evolution of Generative Models
      1. Hopfield Nets
      2. Boltzmann Machine
      3. Restricted Boltzmann Machine (Explicit Density and Intractable)
      4. The Original Autoencoder
    15. Probabilistic Language Modeling
    16. Summary and Looking Ahead
  10. 9. Graph Models
    1. Graphs: Nodes, Edges, and Features for Each
    2. Example: PageRank Algorithm
    3. Inverting Matrices Using Graphs
    4. Cayley Graphs of Groups: Pure Algebra and Parallel Computing
    5. Message Passing Within a Graph
    6. The Limitless Applications of Graphs
      1. Brain Networks
      2. Spread of Disease
      3. Spread of Information
      4. Detecting and Tracking Fake News Propagation
      5. Web-Scale Recommendation Systems
      6. Fighting Cancer
      7. Biochemical Graphs
      8. Molecular Graph Generation for Drug and Protein Structure Discovery
      9. Citation Networks
      10. Social Media Networks and Social Influence Prediction
      11. Sociological Structures
      12. Bayesian Networks
      13. Traffic Forecasting
      14. Logistics and Operations Research
      15. Language Models
      16. Graph Structure of the Web
      17. Automatically Analyzing Computer Programs
      18. Data Structures in Computer Science
      19. Load Balancing in Distributed Networks
      20. Artificial Neural Networks
    7. Random Walks on Graphs
    8. Node Representation Learning
    9. Tasks for Graph Neural Networks
      1. Node Classification
      2. Graph Classification
      3. Clustering and Community Detection
      4. Graph Generation
      5. Influence Maximization
      6. Link Prediction
    10. Dynamic Graph Models
    11. Bayesian Networks
      1. A Bayesian Network Represents a Compactified Conditional Probability Table
      2. Making Predictions Using a Bayesian Network
      3. Bayesian Networks Are Belief Networks, Not Causal Networks
      4. Keep This in Mind About Bayesian Networks
      5. Chains, Forks, and Colliders
      6. Given a Data Set, How Do We Set Up a Bayesian Network for the Involved Variables?
    12. Graph Diagrams for Probabilistic Causal Modeling
    13. A Brief History of Graph Theory
    14. Main Considerations in Graph Theory
      1. Spanning Trees and Shortest Spanning Trees
      2. Cut Sets and Cut Vertices
      3. Planarity
      4. Graphs as Vector Spaces
      5. Realizability
      6. Coloring and Matching
      7. Enumeration
    15. Algorithms and Computational Aspects of Graphs
    16. Summary and Looking Ahead
  11. 10. Operations Research
    1. No Free Lunch
    2. Complexity Analysis and O() Notation
    3. Optimization: The Heart of Operations Research
    4. Thinking About Optimization
      1. Optimization: Finite Dimensions, Unconstrained
      2. Optimization: Finite Dimensions, Constrained Lagrange Multipliers
      3. Optimization: Infinite Dimensions, Calculus of Variations
    5. Optimization on Networks
      1. Traveling Salesman Problem
      2. Minimum Spanning Tree
      3. Shortest Path
      4. Max-Flow Min-Cut
      5. Max-Flow Min-Cost
      6. The Critical Path Method for Project Design
    6. The n-Queens Problem
    7. Linear Optimization
      1. The General Form and the Standard Form
      2. Visualizing a Linear Optimization Problem in Two Dimensions
      3. Convex to Linear
      4. The Geometry of Linear Optimization
      5. The Simplex Method
      6. Transportation and Assignment Problems
      7. Duality, Lagrange Relaxation, Shadow Prices, Max-Min, Min-Max, and All That
      8. Sensitivity
    8. Game Theory and Multiagents
    9. Queuing
    10. Inventory
    11. Machine Learning for Operations Research
    12. Hamilton-Jacobi-Bellman Equation
    13. Operations Research for AI
    14. Summary and Looking Ahead
  12. 11. Probability
    1. Where Did Probability Appear in This Book?
    2. What More Do We Need to Know That Is Essential for AI?
    3. Causal Modeling and the Do Calculus
      1. An Alternative: The Do Calculus
    4. Paradoxes and Diagram Interpretations
      1. Monty Hall Problem
      2. Berkson’s Paradox
      3. Simpson’s Paradox
    5. Large Random Matrices
      1. Examples of Random Vectors and Random Matrices
      2. Main Considerations in Random Matrix Theory
      3. Random Matrix Ensembles
      4. Eigenvalue Density of the Sum of Two Large Random Matrices
      5. Essential Math for Large Random Matrices
    6. Stochastic Processes
      1. Bernoulli Process
      2. Poisson Process
      3. Random Walk
      4. Wiener Process or Brownian Motion
      5. Martingale
      6. Levy Process
      7. Branching Process
      8. Markov Chain
      9. Itô’s Lemma
    7. Markov Decision Processes and Reinforcement Learning
      1. Examples of Reinforcement Learning
      2. Reinforcement Learning as a Markov Decision Process
      3. Reinforcement Learning in the Context of Optimal Control and Nonlinear Dynamics
      4. Python Library for Reinforcement Learning
    8. Theoretical and Rigorous Grounds
      1. Which Events Have a Probability?
      2. Can We Talk About a Wider Range of Random Variables?
      3. A Probability Triple (Sample Space, Sigma Algebra, Probability Measure)
      4. Where Is the Difficulty?
      5. Random Variable, Expectation, and Integration
      6. Distribution of a Random Variable and the Change of Variable Theorem
      7. Next Steps in Rigorous Probability Theory
      8. The Universality Theorem for Neural Networks
    9. Summary and Looking Ahead
  13. 12. Mathematical Logic
    1. Various Logic Frameworks
    2. Propositional Logic
      1. From Few Axioms to a Whole Theory
      2. Codifying Logic Within an Agent
      3. How Do Deterministic and Probabilistic Machine Learning Fit In?
    3. First-Order Logic
      1. Relationships Between For All and There Exist
    4. Probabilistic Logic
    5. Fuzzy Logic
    6. Temporal Logic
    7. Comparison with Human Natural Language
    8. Machines and Complex Mathematical Reasoning
    9. Summary and Looking Ahead
  14. 13. Artificial Intelligence and Partial Differential Equations
    1. What Is a Partial Differential Equation?
    2. Modeling with Differential Equations
      1. Models at Different Scales
      2. The Parameters of a PDE
      3. Changing One Thing in a PDE Can Be a Big Deal
      4. Can AI Step In?
    3. Numerical Solutions Are Very Valuable
      1. Continuous Functions Versus Discrete Functions
      2. PDE Themes from My Ph.D. Thesis
      3. Discretization and the Curse of Dimensionality
      4. Finite Differences
      5. Finite Elements
      6. Variational or Energy Methods
      7. Monte Carlo Methods
    4. Some Statistical Mechanics: The Wonderful Master Equation
    5. Solutions as Expectations of Underlying Random Processes
    6. Transforming the PDE
      1. Fourier Transform
      2. Laplace Transform
    7. Solution Operators
      1. Example Using the Heat Equation
      2. Example Using the Poisson Equation
      3. Fixed Point Iteration
    8. AI for PDEs
      1. Deep Learning to Learn Physical Parameter Values
      2. Deep Learning to Learn Meshes
      3. Deep Learning to Approximate Solution Operators of PDEs
      4. Numerical Solutions of High-Dimensional Differential Equations
      5. Simulating Natural Phenomena Directly from Data
    9. Hamilton-Jacobi-Bellman PDE for Dynamic Programming
    10. PDEs for AI?
    11. Other Considerations in Partial Differential Equations
    12. Summary and Looking Ahead
  15. 14. Artificial Intelligence, Ethics, Mathematics, Law, and Policy
    1. Good AI
    2. Policy Matters
    3. What Could Go Wrong?
      1. From Math to Weapons
      2. Chemical Warfare Agents
      3. AI and Politics
      4. Unintended Outcomes of Generative Models
    4. How to Fix It?
      1. Addressing Underrepresentation in Training Data
      2. Addressing Bias in Word Vectors
      3. Addressing Privacy
      4. Addressing Fairness
      5. Injecting Morality into AI
      6. Democratization and Accessibility of AI to Nonexperts
      7. Prioritizing High Quality Data
    5. Distinguishing Bias from Discrimination
    6. The Hype
    7. Final Thoughts
  16. Index
  17. About the Author

Product information

  • Title: Essential Math for AI
  • Author(s): Hala Nelson
  • Release date: January 2023
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098107635