Generative AI on AWS

Book description

Companies today are moving rapidly to integrate generative AI into their products and services. But there's a great deal of hype (and misunderstanding) about the impact and promise of this technology. With this book, Chris Fregly, Antje Barth, and Shelbee Eigenbrode from AWS help CTOs, ML practitioners, application developers, business analysts, data engineers, and data scientists find practical ways to use this exciting new technology.

You'll learn the generative AI project life cycle including use case definition, model selection, model fine-tuning, retrieval-augmented generation, reinforcement learning from human feedback, and model quantization, optimization, and deployment. And you'll explore different types of models including large language models (LLMs) and multimodal models such as Stable Diffusion for generating images and Flamingo/IDEFICS for answering questions about images.

  • Apply generative AI to your business use cases
  • Determine which generative AI models are best suited to your task
  • Perform prompt engineering and in-context learning
  • Fine-tune generative AI models on your datasets with low-rank adaptation (LoRA)
  • Align generative AI models to human values with reinforcement learning from human feedback (RLHF)
  • Augment your model with retrieval-augmented generation (RAG)
  • Explore libraries such as LangChain and ReAct to develop agents and actions
  • Build generative AI applications with Amazon Bedrock

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Conventions Used in This Book
    2. Using Code Examples
    3. O’Reilly Online Learning
    4. How to Contact Us
    5. Acknowledgments
      1. Chris
      2. Antje
      3. Shelbee
  2. 1. Generative AI Use Cases, Fundamentals, and Project Life Cycle
    1. Use Cases and Tasks
    2. Foundation Models and Model Hubs
    3. Generative AI Project Life Cycle
    4. Generative AI on AWS
    5. Why Generative AI on AWS?
    6. Building Generative AI Applications on AWS
    7. Summary
  3. 2. Prompt Engineering and In-Context Learning
    1. Prompts and Completions
    2. Tokens
    3. Prompt Engineering
    4. Prompt Structure
      1. Instruction
      2. Context
    5. In-Context Learning with Few-Shot Inference
      1. Zero-Shot Inference
      2. One-Shot Inference
      3. Few-Shot Inference
      4. In-Context Learning Gone Wrong
      5. In-Context Learning Best Practices
    6. Prompt-Engineering Best Practices
    7. Inference Configuration Parameters
    8. Summary
  4. 3. Large-Language Foundation Models
    1. Large-Language Foundation Models
    2. Tokenizers
    3. Embedding Vectors
    4. Transformer Architecture
      1. Inputs and Context Window
      2. Embedding Layer
      3. Encoder
      4. Self-Attention
      5. Decoder
      6. Softmax Output
    5. Types of Transformer-Based Foundation Models
    6. Pretraining Datasets
    7. Scaling Laws
    8. Compute-Optimal Models
    9. Summary
  5. 4. Memory and Compute Optimizations
    1. Memory Challenges
    2. Data Types and Numerical Precision
    3. Quantization
      1. fp16
      2. bfloat16
      3. fp8
      4. int8
    4. Optimizing the Self-Attention Layers
      1. FlashAttention
      2. Grouped-Query Attention
    5. Distributed Computing
      1. Distributed Data Parallel
      2. Fully Sharded Data Parallel
      3. Performance Comparison of FSDP over DDP
    6. Distributed Computing on AWS
      1. Fully Sharded Data Parallel with Amazon SageMaker
      2. AWS Neuron SDK and AWS Trainium
    7. Summary
  6. 5. Fine-Tuning and Evaluation
    1. Instruction Fine-Tuning
      1. Llama 2-Chat
      2. Falcon-Chat
      3. FLAN-T5
    2. Instruction Dataset
      1. Multitask Instruction Dataset
      2. FLAN: Example Multitask Instruction Dataset
      3. Prompt Template
      4. Convert a Custom Dataset into an Instruction Dataset
    3. Instruction Fine-Tuning
      1. Amazon SageMaker Studio
      2. Amazon SageMaker JumpStart
      3. Amazon SageMaker Estimator for Hugging Face
    4. Evaluation
      1. Evaluation Metrics
      2. Benchmarks and Datasets
    5. Summary
  7. 6. Parameter-Efficient Fine-Tuning
    1. Full Fine-Tuning Versus PEFT
    2. LoRA and QLoRA
      1. LoRA Fundamentals
      2. Rank
      3. Target Modules and Layers
      4. Applying LoRA
      5. Merging LoRA Adapter with Original Model
      6. Maintaining Separate LoRA Adapters
      7. Full-Fine Tuning Versus LoRA Performance
      8. QLoRA
    3. Prompt Tuning and Soft Prompts
    4. Summary
  8. 7. Fine-Tuning with Reinforcement Learning from Human Feedback
    1. Human Alignment: Helpful, Honest, and Harmless
    2. Reinforcement Learning Overview
    3. Train a Custom Reward Model
      1. Collect Training Dataset with Human-in-the-Loop
      2. Sample Instructions for Human Labelers
      3. Using Amazon SageMaker Ground Truth for Human Annotations
      4. Prepare Ranking Data to Train a Reward Model
      5. Train the Reward Model
    4. Existing Reward Model: Toxicity Detector by Meta
    5. Fine-Tune with Reinforcement Learning from Human Feedback
      1. Using the Reward Model with RLHF
      2. Proximal Policy Optimization RL Algorithm
      3. Perform RLHF Fine-Tuning with PPO
      4. Mitigate Reward Hacking
      5. Using Parameter-Efficient Fine-Tuning with RLHF
    6. Evaluate RLHF Fine-Tuned Model
      1. Qualitative Evaluation
      2. Quantitative Evaluation
      3. Load Evaluation Model
      4. Define Evaluation-Metric Aggregation Function
      5. Compare Evaluation Metrics Before and After
    7. Summary
  9. 8. Model Deployment Optimizations
    1. Model Optimizations for Inference
      1. Pruning
      2. Post-Training Quantization with GPTQ
      3. Distillation
    2. Large Model Inference Container
    3. AWS Inferentia: Purpose-Built Hardware for Inference
    4. Model Update and Deployment Strategies
      1. A/B Testing
      2. Shadow Deployment
    5. Metrics and Monitoring
    6. Autoscaling
      1. Autoscaling Policies
      2. Define an Autoscaling Policy
    7. Summary
  10. 9. Context-Aware Reasoning Applications Using RAG and Agents
    1. Large Language Model Limitations
      1. Hallucination
      2. Knowledge Cutoff
    2. Retrieval-Augmented Generation
      1. External Sources of Knowledge
      2. RAG Workflow
      3. Document Loading
      4. Chunking
      5. Document Retrieval and Reranking
      6. Prompt Augmentation
    3. RAG Orchestration and Implementation
      1. Document Loading and Chunking
      2. Embedding Vector Store and Retrieval
      3. Retrieval Chains
      4. Reranking with Maximum Marginal Relevance
    4. Agents
      1. ReAct Framework
      2. Program-Aided Language Framework
    5. Generative AI Applications
    6. FMOps: Operationalizing the Generative AI Project Life Cycle
      1. Experimentation Considerations
      2. Development Considerations
      3. Production Deployment Considerations
    7. Summary
  11. 10. Multimodal Foundation Models
    1. Use Cases
    2. Multimodal Prompt Engineering Best Practices
    3. Image Generation and Enhancement
      1. Image Generation
      2. Image Editing and Enhancement
    4. Inpainting, Outpainting, Depth-to-Image
      1. Inpainting
      2. Outpainting
      3. Depth-to-Image
    5. Image Captioning and Visual Question Answering
      1. Image Captioning
      2. Content Moderation
      3. Visual Question Answering
    6. Model Evaluation
      1. Text-to-Image Generative Tasks
      2. Forward Diffusion
      3. Nonverbal Reasoning
    7. Diffusion Architecture Fundamentals
      1. Forward Diffusion
      2. Reverse Diffusion
      3. U-Net
    8. Stable Diffusion 2 Architecture
      1. Text Encoder
      2. U-Net and Diffusion Process
      3. Text Conditioning
      4. Cross-Attention
      5. Scheduler
      6. Image Decoder
    9. Stable Diffusion XL Architecture
      1. U-Net and Cross-Attention
      2. Refiner
      3. Conditioning
    10. Summary
  12. 11. Controlled Generation and Fine-Tuning with Stable Diffusion
    1. ControlNet
    2. Fine-Tuning
      1. DreamBooth
      2. DreamBooth and PEFT-LoRA
      3. Textual Inversion
    3. Human Alignment with Reinforcement Learning from Human Feedback
    4. Summary
  13. 12. Amazon Bedrock: Managed Service for Generative AI
    1. Bedrock Foundation Models
      1. Amazon Titan Foundation Models
      2. Stable Diffusion Foundation Models from Stability AI
    2. Bedrock Inference APIs
    3. Large Language Models
      1. Generate SQL Code
      2. Summarize Text
      3. Embeddings
    4. Fine-Tuning
    5. Agents
    6. Multimodal Models
      1. Create Images from Text
      2. Create Images from Images
    7. Data Privacy and Network Security
    8. Governance and Monitoring
    9. Summary
  14. Index
  15. About the Authors

Product information

  • Title: Generative AI on AWS
  • Author(s): Chris Fregly, Antje Barth, Shelbee Eigenbrode
  • Release date: November 2023
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098159221