book

Learn Generative AI with PyTorch

Name: Learn Generative AI with PyTorch
Author: Mark Liu
ISBN: 9781633436466

by Mark Liu

January 2025

Beginner to intermediate

432 pages

13h 16m

English

Manning Publications

Read now

Unlock full access

Learn Generative AI with PyTorch
Copyright
dedication
contents
front matter
forewordprefaceacknowledgmentsabout this bookWho should read this book?How this book is organized: a roadmapAbout the codeliveBook discussion forumabout the authorabout the cover illustration
Part 1. Introduction to generative AI
1 What is generative AI and why PyTorch?
1.1 Introducing generative AI and PyTorch1.1.1 What is generative AI?1.1.2 The Python programming language1.1.3 Using PyTorch as our AI framework1.2 GANs1.2.1 A high-level overview of GANs1.2.2 An illustrating example: Generating anime faces1.2.3 Why should you care about GANs?1.3 Transformers1.3.1 The attention mechanism1.3.2 The Transformer architecture1.3.3 Multimodal Transformers and pretrained LLMs1.4 Why build generative models from scratch?Summary
2 Deep learning with PyTorch
2.1 Data types in PyTorch2.1.1 Creating PyTorch tensors2.1.2 Index and slice PyTorch tensors2.1.3 PyTorch tensor shapes2.1.4 Mathematical operations on PyTorch tensors2.2 An end-to-end deep learning project with PyTorch2.2.1 Deep learning in PyTorch: A high-level overview2.2.2 Preprocessing data2.3 Binary classification2.3.1 Creating batches2.3.2 Building and training a binary classification model2.3.3 Testing the binary classification model2.4 Multicategory classification2.4.1 Validation set and early stopping2.4.2 Building and training a multicategory classification modelSummary
3 Generative adversarial networks: Shape and number generation
3.1 Steps involved in training GANs3.2 Preparing training data3.2.1 A training dataset that forms an exponential growth curve3.2.2 Preparing the training dataset3.3 Creating GANs3.3.1 The discriminator network3.3.2 The generator network3.3.3 Loss functions, optimizers, and early stopping3.4 Training and using GANs for shape generation3.4.1 The training of GANs3.4.2 Saving and using the trained generator3.5 Generating numbers with patterns3.5.1 What are one-hot variables?3.5.2 GANs to generate numbers with patterns3.5.3 Training the GANs to generate numbers with patterns3.5.4 Saving and using the trained modelSummary
Part 2. Image generation

4 Image generation with generative adversarial networks
4.1 GANs to generate grayscale images of clothing items4.1.1 Training samples and the discriminator4.1.2 A generator to create grayscale images4.1.3 Training GANs to generate images of clothing items4.2 Convolutional layers4.2.1 How do convolutional operations work?4.2.2 How do stride and padding affect convolutional operations?4.3 Transposed convolution and batch normalization4.3.1 How do transposed convolutional layers work?4.3.2 Batch normalization4.4 Color images of anime faces4.4.1 Downloading anime faces4.4.2 Channels-first color images in PyTorch4.5 Deep convolutional GAN4.5.1 Building a DCGAN4.5.2 Training and using DCGANSummary
5 Selecting characteristics in generated images
5.1 The eyeglasses dataset5.1.1 Downloading the eyeglasses dataset5.1.2 Visualizing images in the eyeglasses dataset5.2 cGAN and Wasserstein distance5.2.1 WGAN with gradient penalty5.2.2 cGANs5.3 Create a cGAN5.3.1 A critic in cGAN5.3.2 A generator in cGAN5.3.3 Weight initialization and the gradient penalty function5.4 Training the cGAN5.4.1 Adding labels to inputs5.4.2 Training the cGAN5.5 Selecting characteristics in generated images5.5.1 Selecting images with or without eyeglasses5.5.2 Vector arithmetic in latent space5.5.3 Selecting two characteristics simultaneouslySummary
6 CycleGAN: Converting blond hair to black hair
6.1 CycleGAN and cycle consistency loss6.1.1 What is CycleGAN?6.1.2 Cycle consistency loss6.2 The celebrity faces dataset6.2.1 Downloading the celebrity faces dataset6.2.2 Process the black and blond hair image data6.3 Building a CycleGAN model6.3.1 Creating two discriminators6.3.2 Creating two generators6.4 Using CycleGAN to translate between black and blond hair6.4.1 Training a CycleGAN to translate between black and blond hair6.4.2 Round-trip conversions of black hair images and blond hair imagesSummary
7 Image generation with variational autoencoders
7.1 An overview of AEs7.1.1 What is an AE?7.1.2 Steps in building and training an AE7.2 Building and training an AE to generate digits7.2.1 Gathering handwritten digits7.2.2 Building and training an AE7.2.3 Saving and using the trained AE7.3 What are VAEs?7.3.1 Differences between AEs and VAEs7.3.2 The blueprint to train a VAE to generate human face images7.4 A VAE to generate human face images7.4.1 Building a VAE7.4.2 Training the VAE7.4.3 Generating images with the trained VAE7.4.4 Encoding arithmetic with the trained VAESummary
Part 3. Natural language processing and Transformers
8 Text generation with recurrent neural networks
8.1 Introduction to RNNs8.1.1 Challenges in generating text8.1.2 How do RNNs work?8.1.3 Steps in training a LSTM model8.2 Fundamentals of NLP8.2.1 Different tokenization methods8.2.2 Word embedding8.3 Preparing data to train the LSTM model8.3.1 Downloading and cleaning up the text8.3.2 Creating batches of training data8.4 Building and training the LSTM model8.4.1 Building an LSTM model8.4.2 Training the LSTM model8.5 Generating text with the trained LSTM model8.5.1 Generating text by predicting the next token8.5.2 Temperature and top-K sampling in text generationSummary
9 A line-by-line implementation of attention and Transformer
9.1 Introduction to attention and Transformer9.1.1 The attention mechanism9.1.2 The Transformer architecture9.1.3 Different types of Transformers9.2 Building an encoder9.2.1 The attention mechanism9.2.2 Creating an encoder9.3 Building an encoder-decoder Transformer9.3.1 Creating a decoder layer9.3.2 Creating an encoder-decoder Transformer9.4 Putting all the pieces together9.4.1 Defining a generator9.4.2 Creating a model to translate between two languagesSummary
10 Training a Transformer to translate English to French
10.1 Subword tokenization10.1.1 Tokenizing English and French phrases10.1.2 Sequence padding and batch creation10.2 Word embedding and positional encoding10.2.1 Word embedding10.2.2 Positional encoding10.3 Training the Transformer for English-to-French translation10.3.1 Loss function and the optimizer10.3.2 The training loop10.4 Translating English to French with the trained modelSummary
11 Building a generative pretrained Transformer from scratch
11.1 GPT-2 architecture and causal self-attention11.1.1 The architecture of GPT-211.1.2 Word embedding and positional encoding in GPT-211.1.3 Causal self-attention in GPT-211.2 Building GPT-2XL from scratch11.2.1 BPE tokenization11.2.2 The Gaussian error linear unit activation function11.2.3 Causal self-attention11.2.4 Constructing the GPT-2XL model11.3 Loading up pretrained weights and generating text11.3.1 Loading up pretrained parameters in GPT-2XL11.3.2 Defining a generate() function to produce text11.3.3 Text generation with GPT-2XLSummary
12 Training a Transformer to generate text
12.1 Building and training a GPT from scratch12.1.1 The architecture of a GPT to generate text12.1.2 The training process of the GPT model to generate text12.2 Tokenizing text of Hemingway novels12.2.1 Tokenizing the text12.2.2 Creating batches for training12.3 Building a GPT to generate text12.3.1 Model hyperparameters12.3.2 Modeling the causal self-attention mechanism12.3.3 Building the GPT model12.4 Training the GPT model to generate text12.4.1 Training the GPT model12.4.2 A function to generate text12.4.3 Text generation with different versions of the trained modelSummary
Part 4. Applications and new developments
13 Music generation with MuseGAN
13.1 Digital music representation13.1.1 Musical notes, octave, and pitch13.1.2 An introduction to multitrack music13.1.3 Digitally represent music: Piano rolls13.2 A blueprint for music generation13.2.1 Constructing music with chords, style, melody, and groove13.2.2 A blueprint to train a MuseGAN13.3 Preparing the training data for MuseGAN13.3.1 Downloading the training data13.3.2 Converting multidimensional objects to music pieces13.4 Building a MuseGAN13.4.1 A critic in MuseGAN13.4.2 A generator in MuseGAN13.4.3 Optimizers and the loss function13.5 Training the MuseGAN to generate music13.5.1 Training the MuseGAN13.5.2 Generating music with the trained MuseGANSummary
14 Building and training a music Transformer
14.1 Introduction to the music Transformer14.1.1 Performance-based music representation14.1.2 The music Transformer architecture14.1.3 Training the music Transformer14.2 Tokenizing music pieces14.2.1 Downloading training data14.2.2 Tokenizing MIDI files14.2.3 Preparing the training data14.3 Building a GPT to generate music14.3.1 Hyperparameters in the music Transformer14.3.2 Building a music Transformer14.4 Training and using the music Transformer14.4.1 Training the music Transformer14.4.2 Music generation with the trained TransformerSummary
15 Diffusion models and text-to-image Transformers
15.1 Introduction to denoising diffusion models15.1.1 The forward diffusion process15.1.2 Using the U-Net model to denoise images15.1.3 A blueprint to train the denoising U-Net model15.2 Preparing the training data15.2.1 Flower images as the training data15.2.2 Visualizing the forward diffusion process15.3 Building a denoising U-Net model15.3.1 The attention mechanism in the denoising U-Net model15.3.2 The denoising U-Net model15.4 Training and using the denoising U-Net model15.4.1 Training the denoising U-Net model15.4.2 Using the trained model to generate flower images15.5 Text-to-image Transformers15.5.1 CLIP: A multimodal Transformer15.5.2 Text-to-image generation with DALL-E 2Summary
16 Pretrained large language models and the LangChain library
16.1 Content generation with the OpenAI API16.1.1 Text generation tasks with OpenAI API16.1.2 Code generation with OpenAI API16.1.3 Image generation with OpenAI DALL-E 216.1.4 Speech generation with OpenAI API16.2 Introduction to LangChain16.2.1 The need for the LangChain library16.2.2 Using the OpenAI API in LangChain16.2.3 Zero-shot, one-shot, and few-shot prompting16.3 A zero-shot know-it-all agent in LangChain16.3.1 Applying for a Wolfram Alpha API Key16.3.2 Creating an agent in LangChain16.3.3 Adding tools by using OpenAI GPTs16.3.4 Adding tools to generate code and images16.4 Limitations and ethical concerns of LLMs16.4.1 Limitations of LLMs16.4.2 Ethical concerns for LLMsSummary
Appendix A. Installing Python, Jupyter Notebook, and PyTorch
A.1 Installing Python and setting up a virtual environmentA.1.1 Installing AnacondaA.1.2 Setting up a Python virtual environmentA.1.3 Installing Jupyter NotebookA.2 Installing PyTorchA.2.1 Installing PyTorch without CUDAA.2.2 Installing PyTorch with CUDA
Appendix B. Minimally qualified readers and deep learning basics
B.1 Deep learning and deep neural networksB.1.1 Anatomy of a neural networkB.1.2 Different types of layers in neural networksB.1.3 Activation FunctionsB.2 Training a deep neural networkB.2.1 The training processB.2.2 Loss functionsB.2.3 Optimizers
index

Content preview from Learn Generative AI with PyTorch

15 Diffusion models and text-to-image Transformers

This chapter covers

How forward diffusion and reverse diffusion work
How to build and train a denoising U-Net model
Using the trained U-Net to generate flower images
Concepts behind text-to-image Transformers
Writing a Python program to generate an image through text with DALL-E 2

In recent years, multimodal large language models (LLMs) have gained significant attention for their ability to handle various content formats, such as text, images, video, audio, and code. A notable example of this is text-to-image Transformers, such as OpenAI’s DALL-E 2, Google’s Imagen, and Stability AI’s Stable Diffusion. These models are capable of generating high-quality images based on textual descriptions. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781633436466Publisher Support Publisher Website Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Learn Generative AI with PyTorch

by Mark Liu

15 Diffusion models and text-to-image Transformers

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.