book

AI Engineering

Name: AI Engineering
Author: Chip Huyen
ISBN: 9781098166304

by Chip Huyen

December 2024

Intermediate to advanced

534 pages

15h 52m

English

O'Reilly Media, Inc.

Audio summary available

Read now

Unlock full access

Includes

Quizzes

Preface
What This Book Is AboutWhat This Book Is NotWho This Book Is ForNavigating This BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Introduction to Building AI Applications with Foundation Models
The Rise of AI EngineeringFrom Language Models to Large Language ModelsFrom Large Language Models to Foundation ModelsFrom Foundation Models to AI EngineeringFoundation Model Use CasesCodingImage and Video ProductionWritingEducationConversational BotsInformation AggregationData OrganizationWorkflow AutomationPlanning AI ApplicationsUse Case EvaluationSetting ExpectationsMilestone PlanningMaintenanceThe AI Engineering StackThree Layers of the AI StackAI Engineering Versus ML EngineeringAI Engineering Versus Full-Stack EngineeringSummary
2. Understanding Foundation Models
Training DataMultilingual ModelsDomain-Specific ModelsModelingModel ArchitectureModel SizePost-TrainingSupervised FinetuningPreference FinetuningSamplingSampling FundamentalsSampling StrategiesTest Time ComputeStructured OutputsThe Probabilistic Nature of AISummary
3. Evaluation Methodology
Challenges of Evaluating Foundation ModelsUnderstanding Language Modeling MetricsEntropyCross EntropyBits-per-Character and Bits-per-BytePerplexityPerplexity Interpretation and Use CasesExact EvaluationFunctional CorrectnessSimilarity Measurements Against Reference DataIntroduction to EmbeddingAI as a JudgeWhy AI as a Judge?How to Use AI as a JudgeLimitations of AI as a JudgeWhat Models Can Act as Judges?Ranking Models with Comparative EvaluationChallenges of Comparative EvaluationThe Future of Comparative EvaluationSummary
4. Evaluate AI Systems
Evaluation CriteriaDomain-Specific CapabilityGeneration CapabilityInstruction-Following CapabilityCost and LatencyModel SelectionModel Selection WorkflowModel Build Versus BuyNavigate Public BenchmarksDesign Your Evaluation PipelineStep 1. Evaluate All Components in a SystemStep 2. Create an Evaluation Guideline Step 3. Define Evaluation Methods and DataSummary
5. Prompt Engineering
Introduction to PromptingIn-Context Learning: Zero-Shot and Few-ShotSystem Prompt and User PromptContext Length and Context EfficiencyPrompt Engineering Best PracticesWrite Clear and Explicit InstructionsProvide Sufficient ContextBreak Complex Tasks into Simpler SubtasksGive the Model Time to ThinkIterate on Your PromptsEvaluate Prompt Engineering ToolsOrganize and Version PromptsDefensive Prompt EngineeringProprietary Prompts and Reverse Prompt EngineeringJailbreaking and Prompt InjectionInformation ExtractionDefenses Against Prompt AttacksSummary
6. RAG and Agents
RAGRAG ArchitectureRetrieval AlgorithmsRetrieval OptimizationRAG Beyond TextsAgentsAgent OverviewToolsPlanningAgent Failure Modes and EvaluationMemorySummary
7. Finetuning
Finetuning OverviewWhen to FinetuneReasons to FinetuneReasons Not to FinetuneFinetuning and RAGMemory BottlenecksBackpropagation and Trainable ParametersMemory MathNumerical RepresentationsQuantizationFinetuning TechniquesParameter-Efficient FinetuningModel Merging and Multi-Task FinetuningFinetuning TacticsSummary
8. Dataset Engineering
Data CurationData QualityData CoverageData QuantityData Acquisition and AnnotationData Augmentation and SynthesisWhy Data SynthesisTraditional Data Synthesis TechniquesAI-Powered Data SynthesisModel DistillationData ProcessingInspect DataDeduplicate DataClean and Filter DataFormat DataSummary
9. Inference Optimization
Understanding Inference OptimizationInference OverviewInference Performance MetricsAI AcceleratorsInference Optimization Model OptimizationInference Service OptimizationSummary

10. AI Engineering Architecture and User Feedback
AI Engineering ArchitectureStep 1. Enhance ContextStep 2. Put in GuardrailsStep 3. Add Model Router and GatewayStep 4. Reduce Latency with CachesStep 5. Add Agent PatternsMonitoring and ObservabilityAI Pipeline OrchestrationUser FeedbackExtracting Conversational FeedbackFeedback DesignFeedback LimitationsSummary
Epilogue
Index
About the Author

Content preview from AI Engineering

Preface

When ChatGPT came out, like many of my colleagues, I was disoriented. What surprised me wasn’t the model’s size or capabilities. For over a decade, the AI community has known that scaling up a model improves it. In 2012, the AlexNet authors noted in their landmark paper that: “All of our experiments suggest that our results can be improved simply by waiting for faster GPUs and bigger datasets to become available.”¹^, ²

What surprised me was the sheer number of applications this capability boost unlocked. I thought a small increase in model quality metrics might result in a modest increase in applications. Instead, it resulted in an explosion of new possibilities.

Not only have these new AI capabilities increased the demand for AI applications, but they have also lowered the entry barrier for developers. It’s become so easy to get started with building AI applications. It’s even possible to build an application without writing a single line of code. This shift has transformed AI from a specialized discipline into a powerful development tool everyone can use.

Even though AI adoption today seems new, it’s built upon techniques that have been around for a while. Papers about language modeling came out as early as the 1950s. Retrieval-augmented generation (RAG) applications are built upon retrieval technology that has powered search and recommender systems since long before the term RAG was coined. The best practices for deploying traditional machine learning applications—systematic ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098166298Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

AI Engineering

by Chip Huyen

Preface

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.