book

Generative AI on AWS

by Chris Fregly, Antje Barth, Shelbee Eigenbrode

November 2023

Intermediate to advanced

312 pages

8h 15m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Includes

Has Sandbox

Conventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgmentsChrisAntjeShelbee
Use Cases and TasksFoundation Models and Model HubsGenerative AI Project Life CycleGenerative AI on AWSWhy Generative AI on AWS?Building Generative AI Applications on AWSSummary
Prompts and CompletionsTokensPrompt EngineeringPrompt StructureInstructionContextIn-Context Learning with Few-Shot InferenceZero-Shot InferenceOne-Shot InferenceFew-Shot InferenceIn-Context Learning Gone WrongIn-Context Learning Best PracticesPrompt-Engineering Best PracticesInference Configuration ParametersSummary
Large-Language Foundation ModelsTokenizersEmbedding VectorsTransformer ArchitectureInputs and Context WindowEmbedding LayerEncoderSelf-AttentionDecoderSoftmax OutputTypes of Transformer-Based Foundation ModelsPretraining DatasetsScaling Laws Compute-Optimal ModelsSummary
Memory ChallengesData Types and Numerical PrecisionQuantizationfp16bfloat16fp8int8Optimizing the Self-Attention LayersFlashAttentionGrouped-Query AttentionDistributed ComputingDistributed Data ParallelFully Sharded Data ParallelPerformance Comparison of FSDP over DDPDistributed Computing on AWSFully Sharded Data Parallel with Amazon SageMakerAWS Neuron SDK and AWS TrainiumSummary
Instruction Fine-TuningLlama 2-ChatFalcon-ChatFLAN-T5Instruction DatasetMultitask Instruction DatasetFLAN: Example Multitask Instruction DatasetPrompt TemplateConvert a Custom Dataset into an Instruction DatasetInstruction Fine-TuningAmazon SageMaker StudioAmazon SageMaker JumpStartAmazon SageMaker Estimator for Hugging FaceEvaluationEvaluation MetricsBenchmarks and DatasetsSummary
Full Fine-Tuning Versus PEFTLoRA and QLoRALoRA FundamentalsRankTarget Modules and LayersApplying LoRAMerging LoRA Adapter with Original ModelMaintaining Separate LoRA AdaptersFull-Fine Tuning Versus LoRA PerformanceQLoRAPrompt Tuning and Soft PromptsSummary
Human Alignment: Helpful, Honest, and HarmlessReinforcement Learning OverviewTrain a Custom Reward ModelCollect Training Dataset with Human-in-the-LoopSample Instructions for Human LabelersUsing Amazon SageMaker Ground Truth for Human AnnotationsPrepare Ranking Data to Train a Reward ModelTrain the Reward ModelExisting Reward Model: Toxicity Detector by MetaFine-Tune with Reinforcement Learning from Human FeedbackUsing the Reward Model with RLHFProximal Policy Optimization RL AlgorithmPerform RLHF Fine-Tuning with PPOMitigate Reward HackingUsing Parameter-Efficient Fine-Tuning with RLHFEvaluate RLHF Fine-Tuned ModelQualitative EvaluationQuantitative EvaluationLoad Evaluation ModelDefine Evaluation-Metric Aggregation FunctionCompare Evaluation Metrics Before and AfterSummary
Model Optimizations for InferencePruningPost-Training Quantization with GPTQDistillationLarge Model Inference ContainerAWS Inferentia: Purpose-Built Hardware for InferenceModel Update and Deployment StrategiesA/B TestingShadow DeploymentMetrics and MonitoringAutoscalingAutoscaling PoliciesDefine an Autoscaling PolicySummary
Large Language Model LimitationsHallucinationKnowledge CutoffRetrieval-Augmented GenerationExternal Sources of KnowledgeRAG WorkflowDocument Loading ChunkingDocument Retrieval and RerankingPrompt AugmentationRAG Orchestration and ImplementationDocument Loading and ChunkingEmbedding Vector Store and RetrievalRetrieval ChainsReranking with Maximum Marginal RelevanceAgentsReAct FrameworkProgram-Aided Language FrameworkGenerative AI ApplicationsFMOps: Operationalizing the Generative AI Project Life CycleExperimentation ConsiderationsDevelopment Considerations Production Deployment ConsiderationsSummary

Use CasesMultimodal Prompt Engineering Best PracticesImage Generation and EnhancementImage GenerationImage Editing and EnhancementInpainting, Outpainting, Depth-to-ImageInpaintingOutpaintingDepth-to-ImageImage Captioning and Visual Question AnsweringImage CaptioningContent ModerationVisual Question Answering Model EvaluationText-to-Image Generative TasksForward DiffusionNonverbal ReasoningDiffusion Architecture FundamentalsForward DiffusionReverse DiffusionU-Net Stable Diffusion 2 ArchitectureText EncoderU-Net and Diffusion ProcessText ConditioningCross-AttentionSchedulerImage DecoderStable Diffusion XL ArchitectureU-Net and Cross-AttentionRefinerConditioningSummary
ControlNetFine-TuningDreamBoothDreamBooth and PEFT-LoRATextual InversionHuman Alignment with Reinforcement Learning from Human FeedbackSummary
Bedrock Foundation ModelsAmazon Titan Foundation ModelsStable Diffusion Foundation Models from Stability AIBedrock Inference APIsLarge Language ModelsGenerate SQL CodeSummarize TextEmbeddingsFine-TuningAgentsMultimodal ModelsCreate Images from TextCreate Images from ImagesData Privacy and Network SecurityGovernance and MonitoringSummary

Content preview from Generative AI on AWS

Chapter 4. Memory and Compute Optimizations

In Chapter 3, you explored best practices for experimenting with and selecting a foundation model for your use case. The next step is usually to customize the model to your specific needs and datasets. This could include adapting the model to your datasets using a technique called fine-tuning, which you will explore in more detail in Chapter 5. When training or fine-tuning large foundation models, you often face compute challenges—in particular, how to fit large models into GPU memory.

In this chapter, you will explore techniques that help overcome memory limitations. You will learn how to apply quantization and distributed training to minimize the required GPU RAM, and how to scale model training horizontally across multiple GPUs for larger models.

For example, the original 40 billion-parameter Falcon model was trained on a cluster of 48 ml.p4d.24xlarge Amazon SageMaker instances consisting of 384 NVIDIA A100 GPUs, 15TB of GPU RAM, and 55TB of CPU RAM. A more recent version of Falcon was trained on a cluster of 392 ml.p4d.24xlarge SageMaker instances consisting of 3,136 NVIDIA A100 GPUs, 125TB of GPU RAM, and 450TB of CPU RAM. The size and complexity of the Falcon model requires a cluster of GPUs, but also benefits from quantization, as you will see next.

Memory Challenges

One of the most common issues you’ll encounter when you try to train or fine-tune foundation models is running out of memory. If you’ve ever tried training or even ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Kubernetes for the Absolute Beginners - Hands-On

Publisher Resources

ISBN: 9781098159214Errata Page Supplemental Content

Generative AI on AWS

by Chris Fregly, Antje Barth, Shelbee Eigenbrode

Chapter 4. Memory and Compute Optimizations

Memory Challenges

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Kubernetes for the Absolute Beginners - Hands-On

AWS Certified Cloud Practitioner (CLF-C02)

Building AI Agents with LLMs: Harnessing the Power of Generative AI with Autonomous Agents

Building Generative AI Services with FastAPI

Publisher Resources

Chapter 4. Memory and Compute Optimizations

Memory Challenges

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Kubernetes for the Absolute Beginners - Hands-On

AWS Certified Cloud Practitioner (CLF-C02)

Building AI Agents with LLMs: Harnessing the Power of Generative AI with Autonomous Agents

Building Generative AI Services with FastAPI

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.