book

AI Engineering

Name: AI Engineering
Author: Chip Huyen
ISBN: 9781098166304

by Chip Huyen

December 2024

Intermediate to advanced

534 pages

15h 52m

English

O'Reilly Media, Inc.

Audio summary available

Read now

Unlock full access

Includes

Quizzes

Preface
What This Book Is AboutWhat This Book Is NotWho This Book Is ForNavigating This BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Introduction to Building AI Applications with Foundation Models
The Rise of AI EngineeringFrom Language Models to Large Language ModelsFrom Large Language Models to Foundation ModelsFrom Foundation Models to AI EngineeringFoundation Model Use CasesCodingImage and Video ProductionWritingEducationConversational BotsInformation AggregationData OrganizationWorkflow AutomationPlanning AI ApplicationsUse Case EvaluationSetting ExpectationsMilestone PlanningMaintenanceThe AI Engineering StackThree Layers of the AI StackAI Engineering Versus ML EngineeringAI Engineering Versus Full-Stack EngineeringSummary
2. Understanding Foundation Models
Training DataMultilingual ModelsDomain-Specific ModelsModelingModel ArchitectureModel SizePost-TrainingSupervised FinetuningPreference FinetuningSamplingSampling FundamentalsSampling StrategiesTest Time ComputeStructured OutputsThe Probabilistic Nature of AISummary
3. Evaluation Methodology
Challenges of Evaluating Foundation ModelsUnderstanding Language Modeling MetricsEntropyCross EntropyBits-per-Character and Bits-per-BytePerplexityPerplexity Interpretation and Use CasesExact EvaluationFunctional CorrectnessSimilarity Measurements Against Reference DataIntroduction to EmbeddingAI as a JudgeWhy AI as a Judge?How to Use AI as a JudgeLimitations of AI as a JudgeWhat Models Can Act as Judges?Ranking Models with Comparative EvaluationChallenges of Comparative EvaluationThe Future of Comparative EvaluationSummary
4. Evaluate AI Systems
Evaluation CriteriaDomain-Specific CapabilityGeneration CapabilityInstruction-Following CapabilityCost and LatencyModel SelectionModel Selection WorkflowModel Build Versus BuyNavigate Public BenchmarksDesign Your Evaluation PipelineStep 1. Evaluate All Components in a SystemStep 2. Create an Evaluation Guideline Step 3. Define Evaluation Methods and DataSummary
5. Prompt Engineering
Introduction to PromptingIn-Context Learning: Zero-Shot and Few-ShotSystem Prompt and User PromptContext Length and Context EfficiencyPrompt Engineering Best PracticesWrite Clear and Explicit InstructionsProvide Sufficient ContextBreak Complex Tasks into Simpler SubtasksGive the Model Time to ThinkIterate on Your PromptsEvaluate Prompt Engineering ToolsOrganize and Version PromptsDefensive Prompt EngineeringProprietary Prompts and Reverse Prompt EngineeringJailbreaking and Prompt InjectionInformation ExtractionDefenses Against Prompt AttacksSummary
6. RAG and Agents
RAGRAG ArchitectureRetrieval AlgorithmsRetrieval OptimizationRAG Beyond TextsAgentsAgent OverviewToolsPlanningAgent Failure Modes and EvaluationMemorySummary
7. Finetuning
Finetuning OverviewWhen to FinetuneReasons to FinetuneReasons Not to FinetuneFinetuning and RAGMemory BottlenecksBackpropagation and Trainable ParametersMemory MathNumerical RepresentationsQuantizationFinetuning TechniquesParameter-Efficient FinetuningModel Merging and Multi-Task FinetuningFinetuning TacticsSummary
8. Dataset Engineering
Data CurationData QualityData CoverageData QuantityData Acquisition and AnnotationData Augmentation and SynthesisWhy Data SynthesisTraditional Data Synthesis TechniquesAI-Powered Data SynthesisModel DistillationData ProcessingInspect DataDeduplicate DataClean and Filter DataFormat DataSummary
9. Inference Optimization
Understanding Inference OptimizationInference OverviewInference Performance MetricsAI AcceleratorsInference Optimization Model OptimizationInference Service OptimizationSummary

10. AI Engineering Architecture and User Feedback
AI Engineering ArchitectureStep 1. Enhance ContextStep 2. Put in GuardrailsStep 3. Add Model Router and GatewayStep 4. Reduce Latency with CachesStep 5. Add Agent PatternsMonitoring and ObservabilityAI Pipeline OrchestrationUser FeedbackExtracting Conversational FeedbackFeedback DesignFeedback LimitationsSummary
Epilogue
Index
About the Author

Content preview from AI Engineering

Chapter 4. Evaluate AI Systems

A model is only useful if it works for its intended purposes. You need to evaluate models in the context of your application. Chapter 3 discusses different approaches to automatic evaluation. This chapter discusses how to use these approaches to evaluate models for your applications.

This chapter contains three parts. It starts with a discussion of the criteria you might use to evaluate your applications and how these criteria are defined and calculated. For example, many people worry about AI making up facts—how is factual consistency detected? How are domain-specific capabilities like math, science, reasoning, and summarization measured?

The second part focuses on model selection. Given an increasing number of foundation models to choose from, it can feel overwhelming to choose the right model for your application. Thousands of benchmarks have been introduced to evaluate these models along different criteria. Can these benchmarks be trusted? How do you select what benchmarks to use? How about public leaderboards that aggregate multiple benchmarks?

The model landscape is teeming with proprietary models and open source models. A question many teams will need to visit over and over again is whether to host their own models or to use a model API. This question has become more nuanced with the introduction of model API services built on top of open source models.

The last part discusses developing an evaluation pipeline that can guide the development ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098166298Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

AI Engineering

by Chip Huyen

Chapter 4. Evaluate AI Systems

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.