Generative Deep Learning, 2nd Edition

Book description

Generative AI is the hottest topic in tech. This practical book teaches machine learning engineers and data scientists how to use TensorFlow and Keras to create impressive generative deep learning models from scratch, including variational autoencoders (VAEs), generative adversarial networks (GANs), Transformers, normalizing flows, energy-based models, and denoising diffusion models.

The book starts with the basics of deep learning and progresses to cutting-edge architectures. Through tips and tricks, you'll understand how to make your models learn more efficiently and become more creative.

  • Discover how VAEs can change facial expressions in photos
  • Train GANs to generate images based on your own dataset
  • Build diffusion models to produce new varieties of flowers
  • Train your own GPT for text generation
  • Learn how large language models like ChatGPT are trained
  • Explore state-of-the-art architectures such as StyleGAN2 and ViT-VQGAN
  • Compose polyphonic music using Transformers and MuseGAN
  • Understand how generative world models can solve reinforcement learning tasks
  • Dive into multimodal models such as DALL.E 2, Imagen, and Stable Diffusion

This book also explores the future of generative AI and how individuals and companies can proactively begin to leverage this remarkable new technology to create competitive advantage.

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. Objective and Approach
    2. Prerequisites
    3. Roadmap
    4. Changes in the Second Edition
    5. Other Resources
    6. Conventions Used in This Book
    7. Codebase
    8. Using Code Examples
    9. O’Reilly Online Learning
    10. How to Contact Us
    11. Acknowledgments
  3. I. Introduction to Generative Deep Learning
  4. 1. Generative Modeling
    1. What Is Generative Modeling?
      1. Generative Versus Discriminative Modeling
      2. The Rise of Generative Modeling
      3. Generative Modeling and AI
    2. Our First Generative Model
      1. Hello World!
      2. The Generative Modeling Framework
      3. Representation Learning
    3. Core Probability Theory
    4. Generative Model Taxonomy
    5. The Generative Deep Learning Codebase
      1. Cloning the Repository
      2. Using Docker
      3. Running on a GPU
    6. Summary
  5. 2. Deep Learning
    1. Data for Deep Learning
    2. Deep Neural Networks
      1. What Is a Neural Network?
      2. Learning High-Level Features
      3. TensorFlow and Keras
    3. Multilayer Perceptron (MLP)
      1. Preparing the Data
      2. Building the Model
      3. Compiling the Model
      4. Training the Model
      5. Evaluating the Model
    4. Convolutional Neural Network (CNN)
      1. Convolutional Layers
      2. Batch Normalization
      3. Dropout
      4. Building the CNN
      5. Training and Evaluating the CNN
    5. Summary
  6. II. Methods
  7. 3. Variational Autoencoders
    1. Introduction
    2. Autoencoders
      1. The Fashion-MNIST Dataset
      2. The Autoencoder Architecture
      3. The Encoder
      4. The Decoder
      5. Joining the Encoder to the Decoder
      6. Reconstructing Images
      7. Visualizing the Latent Space
      8. Generating New Images
    3. Variational Autoencoders
      1. The Encoder
      2. The Loss Function
      3. Training the Variational Autoencoder
      4. Analysis of the Variational Autoencoder
    4. Exploring the Latent Space
      1. The CelebA Dataset
      2. Training the Variational Autoencoder
      3. Analysis of the Variational Autoencoder
      4. Generating New Faces
      5. Latent Space Arithmetic
      6. Morphing Between Faces
    5. Summary
  8. 4. Generative Adversarial Networks
    1. Introduction
    2. Deep Convolutional GAN (DCGAN)
      1. The Bricks Dataset
      2. The Discriminator
      3. The Generator
      4. Training the DCGAN
      5. Analysis of the DCGAN
      6. GAN Training: Tips and Tricks
    3. Wasserstein GAN with Gradient Penalty (WGAN-GP)
      1. Wasserstein Loss
      2. The Lipschitz Constraint
      3. Enforcing the Lipschitz Constraint
      4. The Gradient Penalty Loss
      5. Training the WGAN-GP
      6. Analysis of the WGAN-GP
    4. Conditional GAN (CGAN)
      1. CGAN Architecture
      2. Training the CGAN
      3. Analysis of the CGAN
    5. Summary
  9. 5. Autoregressive Models
    1. Introduction
    2. Long Short-Term Memory Network (LSTM)
      1. The Recipes Dataset
      2. Working with Text Data
      3. Tokenization
      4. Creating the Training Set
      5. The LSTM Architecture
      6. The Embedding Layer
      7. The LSTM Layer
      8. The LSTM Cell
      9. Training the LSTM
      10. Analysis of the LSTM
    3. Recurrent Neural Network (RNN) Extensions
      1. Stacked Recurrent Networks
      2. Gated Recurrent Units
      3. Bidirectional Cells
    4. PixelCNN
      1. Masked Convolutional Layers
      2. Residual Blocks
      3. Training the PixelCNN
      4. Analysis of the PixelCNN
      5. Mixture Distributions
    5. Summary
  10. 6. Normalizing Flow Models
    1. Introduction
    2. Normalizing Flows
      1. Change of Variables
      2. The Jacobian Determinant
      3. The Change of Variables Equation
    3. RealNVP
      1. The Two Moons Dataset
      2. Coupling Layers
      3. Training the RealNVP Model
      4. Analysis of the RealNVP Model
    4. Other Normalizing Flow Models
      1. GLOW
      2. FFJORD
    5. Summary
  11. 7. Energy-Based Models
    1. Introduction
    2. Energy-Based Models
      1. The MNIST Dataset
      2. The Energy Function
      3. Sampling Using Langevin Dynamics
      4. Training with Contrastive Divergence
      5. Analysis of the Energy-Based Model
      6. Other Energy-Based Models
    3. Summary
  12. 8. Diffusion Models
    1. Introduction
    2. Denoising Diffusion Models (DDM)
      1. The Flowers Dataset
      2. The Forward Diffusion Process
      3. The Reparameterization Trick
      4. Diffusion Schedules
      5. The Reverse Diffusion Process
      6. The U-Net Denoising Model
      7. Training the Diffusion Model
      8. Sampling from the Denoising Diffusion Model
      9. Analysis of the Diffusion Model
    3. Summary
  13. III. Applications
  14. 9. Transformers
    1. Introduction
    2. GPT
      1. The Wine Reviews Dataset
      2. Attention
      3. Queries, Keys, and Values
      4. Multihead Attention
      5. Causal Masking
      6. The Transformer Block
      7. Positional Encoding
      8. Training GPT
      9. Analysis of GPT
    3. Other Transformers
      1. T5
      2. GPT-3 and GPT-4
      3. ChatGPT
    4. Summary
  15. 10. Advanced GANs
    1. Introduction
    2. ProGAN
      1. Progressive Training
      2. Outputs
    3. StyleGAN
      1. The Mapping Network
      2. The Synthesis Network
      3. Outputs from StyleGAN
    4. StyleGAN2
      1. Weight Modulation and Demodulation
      2. Path Length Regularization
      3. No Progressive Growing
      4. Outputs from StyleGAN2
    5. Other Important GANs
      1. Self-Attention GAN (SAGAN)
      2. BigGAN
      3. VQ-GAN
      4. ViT VQ-GAN
    6. Summary
  16. 11. Music Generation
    1. Introduction
    2. Transformers for Music Generation
      1. The Bach Cello Suite Dataset
      2. Parsing MIDI Files
      3. Tokenization
      4. Creating the Training Set
      5. Sine Position Encoding
      6. Multiple Inputs and Outputs
      7. Analysis of the Music-Generating Transformer
      8. Tokenization of Polyphonic Music
    3. MuseGAN
      1. The Bach Chorale Dataset
      2. The MuseGAN Generator
      3. The MuseGAN Critic
      4. Analysis of the MuseGAN
    4. Summary
  17. 12. World Models
    1. Introduction
    2. Reinforcement Learning
      1. The CarRacing Environment
    3. World Model Overview
      1. Architecture
      2. Training
    4. Collecting Random Rollout Data
    5. Training the VAE
      1. The VAE Architecture
      2. Exploring the VAE
    6. Collecting Data to Train the MDN-RNN
    7. Training the MDN-RNN
      1. The MDN-RNN Architecture
      2. Sampling from the MDN-RNN
    8. Training the Controller
      1. The Controller Architecture
      2. CMA-ES
      3. Parallelizing CMA-ES
    9. In-Dream Training
    10. Summary
  18. 13. Multimodal Models
    1. Introduction
    2. DALL.E 2
      1. Architecture
      2. The Text Encoder
      3. CLIP
      4. The Prior
      5. The Decoder
      6. Examples from DALL.E 2
    3. Imagen
      1. Architecture
      2. DrawBench
      3. Examples from Imagen
    4. Stable Diffusion
      1. Architecture
      2. Examples from Stable Diffusion
    5. Flamingo
      1. Architecture
      2. The Vision Encoder
      3. The Perceiver Resampler
      4. The Language Model
      5. Examples from Flamingo
    6. Summary
  19. 14. Conclusion
    1. Timeline of Generative AI
      1. 2014–2017: The VAE and GAN Era
      2. 2018–2019: The Transformer Era
      3. 2020–2022: The Big Model Era
    2. The Current State of Generative AI
      1. Large Language Models
      2. Text-to-Code Models
      3. Text-to-Image Models
      4. Other Applications
    3. The Future of Generative AI
      1. Generative AI in Everyday Life
      2. Generative AI in the Workplace
      3. Generative AI in Education
      4. Generative AI Ethics and Challenges
    4. Final Thoughts
  20. Index
  21. About the Author

Product information

  • Title: Generative Deep Learning, 2nd Edition
  • Author(s): David Foster
  • Release date: April 2023
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098134181