book

Generative AI on AWS

by Chris Fregly, Antje Barth, Shelbee Eigenbrode

November 2023

Intermediate to advanced

312 pages

8h 15m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Includes

Has Sandbox

Conventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgmentsChrisAntjeShelbee
Use Cases and TasksFoundation Models and Model HubsGenerative AI Project Life CycleGenerative AI on AWSWhy Generative AI on AWS?Building Generative AI Applications on AWSSummary
Prompts and CompletionsTokensPrompt EngineeringPrompt StructureInstructionContextIn-Context Learning with Few-Shot InferenceZero-Shot InferenceOne-Shot InferenceFew-Shot InferenceIn-Context Learning Gone WrongIn-Context Learning Best PracticesPrompt-Engineering Best PracticesInference Configuration ParametersSummary
Large-Language Foundation ModelsTokenizersEmbedding VectorsTransformer ArchitectureInputs and Context WindowEmbedding LayerEncoderSelf-AttentionDecoderSoftmax OutputTypes of Transformer-Based Foundation ModelsPretraining DatasetsScaling Laws Compute-Optimal ModelsSummary
Memory ChallengesData Types and Numerical PrecisionQuantizationfp16bfloat16fp8int8Optimizing the Self-Attention LayersFlashAttentionGrouped-Query AttentionDistributed ComputingDistributed Data ParallelFully Sharded Data ParallelPerformance Comparison of FSDP over DDPDistributed Computing on AWSFully Sharded Data Parallel with Amazon SageMakerAWS Neuron SDK and AWS TrainiumSummary
Instruction Fine-TuningLlama 2-ChatFalcon-ChatFLAN-T5Instruction DatasetMultitask Instruction DatasetFLAN: Example Multitask Instruction DatasetPrompt TemplateConvert a Custom Dataset into an Instruction DatasetInstruction Fine-TuningAmazon SageMaker StudioAmazon SageMaker JumpStartAmazon SageMaker Estimator for Hugging FaceEvaluationEvaluation MetricsBenchmarks and DatasetsSummary
Full Fine-Tuning Versus PEFTLoRA and QLoRALoRA FundamentalsRankTarget Modules and LayersApplying LoRAMerging LoRA Adapter with Original ModelMaintaining Separate LoRA AdaptersFull-Fine Tuning Versus LoRA PerformanceQLoRAPrompt Tuning and Soft PromptsSummary
Human Alignment: Helpful, Honest, and HarmlessReinforcement Learning OverviewTrain a Custom Reward ModelCollect Training Dataset with Human-in-the-LoopSample Instructions for Human LabelersUsing Amazon SageMaker Ground Truth for Human AnnotationsPrepare Ranking Data to Train a Reward ModelTrain the Reward ModelExisting Reward Model: Toxicity Detector by MetaFine-Tune with Reinforcement Learning from Human FeedbackUsing the Reward Model with RLHFProximal Policy Optimization RL AlgorithmPerform RLHF Fine-Tuning with PPOMitigate Reward HackingUsing Parameter-Efficient Fine-Tuning with RLHFEvaluate RLHF Fine-Tuned ModelQualitative EvaluationQuantitative EvaluationLoad Evaluation ModelDefine Evaluation-Metric Aggregation FunctionCompare Evaluation Metrics Before and AfterSummary
Model Optimizations for InferencePruningPost-Training Quantization with GPTQDistillationLarge Model Inference ContainerAWS Inferentia: Purpose-Built Hardware for InferenceModel Update and Deployment StrategiesA/B TestingShadow DeploymentMetrics and MonitoringAutoscalingAutoscaling PoliciesDefine an Autoscaling PolicySummary
Large Language Model LimitationsHallucinationKnowledge CutoffRetrieval-Augmented GenerationExternal Sources of KnowledgeRAG WorkflowDocument Loading ChunkingDocument Retrieval and RerankingPrompt AugmentationRAG Orchestration and ImplementationDocument Loading and ChunkingEmbedding Vector Store and RetrievalRetrieval ChainsReranking with Maximum Marginal RelevanceAgentsReAct FrameworkProgram-Aided Language FrameworkGenerative AI ApplicationsFMOps: Operationalizing the Generative AI Project Life CycleExperimentation ConsiderationsDevelopment Considerations Production Deployment ConsiderationsSummary

Use CasesMultimodal Prompt Engineering Best PracticesImage Generation and EnhancementImage GenerationImage Editing and EnhancementInpainting, Outpainting, Depth-to-ImageInpaintingOutpaintingDepth-to-ImageImage Captioning and Visual Question AnsweringImage CaptioningContent ModerationVisual Question Answering Model EvaluationText-to-Image Generative TasksForward DiffusionNonverbal ReasoningDiffusion Architecture FundamentalsForward DiffusionReverse DiffusionU-Net Stable Diffusion 2 ArchitectureText EncoderU-Net and Diffusion ProcessText ConditioningCross-AttentionSchedulerImage DecoderStable Diffusion XL ArchitectureU-Net and Cross-AttentionRefinerConditioningSummary
ControlNetFine-TuningDreamBoothDreamBooth and PEFT-LoRATextual InversionHuman Alignment with Reinforcement Learning from Human FeedbackSummary
Bedrock Foundation ModelsAmazon Titan Foundation ModelsStable Diffusion Foundation Models from Stability AIBedrock Inference APIsLarge Language ModelsGenerate SQL CodeSummarize TextEmbeddingsFine-TuningAgentsMultimodal ModelsCreate Images from TextCreate Images from ImagesData Privacy and Network SecurityGovernance and MonitoringSummary

Content preview from Generative AI on AWS

Chapter 7. Fine-Tuning with Reinforcement Learning from Human Feedback

As you learned in Chapters 5 and 6, fine-tuning with instructions can improve your model’s performance and help the model to better understand humanlike prompts and generate more humanlike responses. However, it doesn’t prevent the model from generating undesired, false, and sometimes even harmful completions.

Undesirable output is really no surprise, given that these models are trained on vast amounts of text data from the internet, which unfortunately contains plenty of bad language and toxicity. And while researchers and practitioners continue to scrub and refine pretraining datasets to remove unwanted data, there is still a chance that the model could generate content that does not positively align with human values and preferences.

Reinforcement learning from human feedback (RLHF) is a fine-tuning mechanism that uses human annotation—also called human feedback—to help the model adapt to human values and preferences. RLHF is most commonly applied after other forms of fine-tuning, including instruction fine-tuning.

While RLHF is typically used to help a model generate more humanlike and human-aligned outputs, you could also use RLHF to fine-tune highly personalized models. For example, you could fine-tune a chat assistant specific to each user of your application. This chat assistant can adopt the style, voice, or sense of humor of each user based on their interactions with your application.

In this chapter, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Kubernetes for the Absolute Beginners - Hands-On

Publisher Resources

ISBN: 9781098159214Errata Page Supplemental Content

Generative AI on AWS

by Chris Fregly, Antje Barth, Shelbee Eigenbrode

Chapter 7. Fine-Tuning with Reinforcement Learning from Human Feedback

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Kubernetes for the Absolute Beginners - Hands-On

AWS Certified Cloud Practitioner (CLF-C02)

Building AI Agents with LLMs: Harnessing the Power of Generative AI with Autonomous Agents

Building Generative AI Services with FastAPI

Publisher Resources

Chapter 7. Fine-Tuning with Reinforcement Learning from Human Feedback

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Kubernetes for the Absolute Beginners - Hands-On

AWS Certified Cloud Practitioner (CLF-C02)

Building AI Agents with LLMs: Harnessing the Power of Generative AI with Autonomous Agents

Building Generative AI Services with FastAPI

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.