Fundamentals of Deep Learning, 2nd Edition

Book description

We're in the midst of an AI research explosion. Deep learning has unlocked superhuman perception to power our push toward creating self-driving vehicles, defeating human experts at a variety of difficult games including Go, and even generating essays with shockingly coherent prose. But deciphering these breakthroughs often takes a PhD in machine learning and mathematics.

The updated second edition of this book describes the intuition behind these innovations without jargon or complexity. Python-proficient programmers, software engineering professionals, and computer science majors will be able to reimplement these breakthroughs on their own and reason about them with a level of sophistication that rivals some of the best developers in the field.

  • Learn the mathematics behind machine learning jargon
  • Examine the foundations of machine learning and neural networks
  • Manage problems that arise as you begin to make networks deeper
  • Build neural networks that analyze complex images
  • Perform effective dimensionality reduction using autoencoders
  • Dive deep into sequence analysis to examine language
  • Explore methods in interpreting complex machine learning models
  • Gain theoretical and practical knowledge on generative modeling
  • Understand the fundamentals of reinforcement learning

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Prerequisites and Objectives
    2. How Is This Book Organized?
    3. Conventions Used in This Book
    4. Using Code Examples
    5. O’Reilly Online Learning
    6. How to Contact Us
    7. Acknowledgements
      1. Nithin and Nikhil
      2. Joe
  2. 1. Fundamentals of Linear Algebra for Deep Learning
    1. Data Structures and Operations
      1. Matrix Operations
      2. Vector Operations
      3. Matrix-Vector Multiplication
    2. The Fundamental Spaces
      1. The Column Space
      2. The Null Space
    3. Eigenvectors and Eigenvalues
    4. Summary
  3. 2. Fundamentals of Probability
    1. Events and Probability
    2. Conditional Probability
    3. Random Variables
    4. Expectation
    5. Variance
    6. Bayes’ Theorem
    7. Entropy, Cross Entropy, and KL Divergence
    8. Continuous Probability Distributions
    9. Summary
  4. 3. The Neural Network
    1. Building Intelligent Machines
    2. The Limits of Traditional Computer Programs
    3. The Mechanics of Machine Learning
    4. The Neuron
    5. Expressing Linear Perceptrons as Neurons
    6. Feed-Forward Neural Networks
    7. Linear Neurons and Their Limitations
    8. Sigmoid, Tanh, and ReLU Neurons
    9. Softmax Output Layers
    10. Summary
  5. 4. Training Feed-Forward Neural Networks
    1. The Fast-Food Problem
    2. Gradient Descent
    3. The Delta Rule and Learning Rates
    4. Gradient Descent with Sigmoidal Neurons
    5. The Backpropagation Algorithm
    6. Stochastic and Minibatch Gradient Descent
    7. Test Sets, Validation Sets, and Overfitting
    8. Preventing Overfitting in Deep Neural Networks
    9. Summary
  6. 5. Implementing Neural Networks in PyTorch
    1. Introduction to PyTorch
    2. Installing PyTorch
    3. PyTorch Tensors
      1. Tensor Init
      2. Tensor Attributes
      3. Tensor Operations
    4. Gradients in PyTorch
    5. The PyTorch nn Module
    6. PyTorch Datasets and Dataloaders
    7. Building the MNIST Classifier in PyTorch
    8. Summary
  7. 6. Beyond Gradient Descent
    1. The Challenges with Gradient Descent
    2. Local Minima in the Error Surfaces of Deep Networks
    3. Model Identifiability
    4. How Pesky Are Spurious Local Minima in Deep Networks?
    5. Flat Regions in the Error Surface
    6. When the Gradient Points in the Wrong Direction
    7. Momentum-Based Optimization
    8. A Brief View of Second-Order Methods
    9. Learning Rate Adaptation
      1. AdaGrad—Accumulating Historical Gradients
      2. RMSProp—Exponentially Weighted Moving Average of Gradients
      3. Adam—Combining Momentum and RMSProp
    10. The Philosophy Behind Optimizer Selection
    11. Summary
  8. 7. Convolutional Neural Networks
    1. Neurons in Human Vision
    2. The Shortcomings of Feature Selection
    3. Vanilla Deep Neural Networks Don’t Scale
    4. Filters and Feature Maps
    5. Full Description of the Convolutional Layer
    6. Max Pooling
    7. Full Architectural Description of Convolution Networks
    8. Closing the Loop on MNIST with Convolutional Networks
    9. Image Preprocessing Pipelines Enable More Robust Models
    10. Accelerating Training with Batch Normalization
    11. Group Normalization for Memory Constrained Learning Tasks
    12. Building a Convolutional Network for CIFAR-10
    13. Visualizing Learning in Convolutional Networks
    14. Residual Learning and Skip Connections for Very Deep Networks
    15. Building a Residual Network with Superhuman Vision
    16. Leveraging Convolutional Filters to Replicate Artistic Styles
    17. Learning Convolutional Filters for Other Problem Domains
    18. Summary
  9. 8. Embedding and Representation Learning
    1. Learning Lower-Dimensional Representations
    2. Principal Component Analysis
    3. Motivating the Autoencoder Architecture
    4. Implementing an Autoencoder in PyTorch
    5. Denoising to Force Robust Representations
    6. Sparsity in Autoencoders
    7. When Context Is More Informative than the Input Vector
    8. The Word2Vec Framework
    9. Implementing the Skip-Gram Architecture
    10. Summary
  10. 9. Models for Sequence Analysis
    1. Analyzing Variable-Length Inputs
    2. Tackling seq2seq with Neural N-Grams
    3. Implementing a Part-of-Speech Tagger
    4. Dependency Parsing and SyntaxNet
    5. Beam Search and Global Normalization
    6. A Case for Stateful Deep Learning Models
    7. Recurrent Neural Networks
    8. The Challenges with Vanishing Gradients
    9. Long Short-Term Memory Units
    10. PyTorch Primitives for RNN Models
    11. Implementing a Sentiment Analysis Model
    12. Solving seq2seq Tasks with Recurrent Neural Networks
    13. Augmenting Recurrent Networks with Attention
    14. Dissecting a Neural Translation Network
    15. Self-Attention and Transformers
    16. Summary
  11. 10. Generative Models
    1. Generative Adversarial Networks
    2. Variational Autoencoders
    3. Implementing a VAE
    4. Score-Based Generative Models
    5. Denoising Autoencoders and Score Matching
    6. Summary
  12. 11. Methods in Interpretability
    1. Overview
    2. Decision Trees and Tree-Based Algorithms
    3. Linear Regression
    4. Methods for Evaluating Feature Importance
      1. Permutation Feature Importance
      2. Partial Dependence Plots
    5. Extractive Rationalization
    6. LIME
    7. SHAP
    8. Summary
  13. 12. Memory Augmented Neural Networks
    1. Neural Turing Machines
    2. Attention-Based Memory Access
    3. NTM Memory Addressing Mechanisms
    4. Differentiable Neural Computers
    5. Interference-Free Writing in DNCs
    6. DNC Memory Reuse
    7. Temporal Linking of DNC Writes
    8. Understanding the DNC Read Head
    9. The DNC Controller Network
    10. Visualizing the DNC in Action
    11. Implementing the DNC in PyTorch
    12. Teaching a DNC to Read and Comprehend
    13. Summary
  14. 13. Deep Reinforcement Learning
    1. Deep Reinforcement Learning Masters Atari Games
    2. What Is Reinforcement Learning?
    3. Markov Decision Processes
      1. Policy
      2. Future Return
      3. Discounted Future Return
    4. Explore Versus Exploit
      1. ϵ -Greedy
      2. Annealed ϵ -Greedy
    5. Policy Versus Value Learning
    6. Pole-Cart with Policy Gradients
      1. OpenAI Gym
      2. Creating an Agent
      3. Building the Model and Optimizer
      4. Sampling Actions
      5. Keeping Track of History
      6. Policy Gradient Main Function
      7. PGAgent Performance on Pole-Cart
    7. Trust-Region Policy Optimization
    8. Proximal Policy Optimization
    9. Q-Learning and Deep Q-Networks
      1. The Bellman Equation
      2. Issues with Value Iteration
      3. Approximating the Q-Function
      4. Deep Q-Network
      5. Training DQN
      6. Learning Stability
      7. Target Q-Network
      8. Experience Replay
      9. From Q-Function to Policy
      10. DQN and the Markov Assumption
      11. DQN’s Solution to the Markov Assumption
      12. Playing Breakout with DQN
      13. Building Our Architecture
      14. Stacking Frames
      15. Setting Up Training Operations
      16. Updating Our Target Q-Network
      17. Implementing Experience Replay
      18. DQN Main Loop
      19. DQNAgent Results on Breakout
    10. Improving and Moving Beyond DQN
      1. Deep Recurrent Q-Networks
      2. Asynchronous Advantage Actor-Critic Agent
      3. UNsupervised REinforcement and Auxiliary Learning
    11. Summary
  15. Index
  16. About the Authors

Product information

  • Title: Fundamentals of Deep Learning, 2nd Edition
  • Author(s): Nithin Buduma, Nikhil Buduma, Joe Papa
  • Release date: May 2022
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492082187