book

GenAI on Google Cloud

by Ayo Adedeji, Lavi Nigam, Sarita Joshi, Stephanie Gervasi

January 2026

Intermediate to advanced

320 pages

9h 58m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Why This Book MattersWhat You’ll Find in This BookOur ApproachWho This Book Is ForPrerequisitesConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. The Challenge of Generative AI Application Development
Overview of LLMs, Generative AI Agents, and Potential Applications to Business TasksSmall Language Models (SLMs)Foundation Models and MultimodalityDomain-Specific and Reasoning ModelsGenerative AI AgentsAgent ArchitecturesChallenges in Development, Deployment, and MaintenanceDevelopment ChallengesDeployment ChallengesMaintenance ChallengesAddressing Challenges with Modern PlatformsIndustry Use Cases and ROILooking AheadLearning Labs
2. Data Readiness and Accessibility
The Amplified Importance of Data for GenAIWhat Data Readiness Really Means for GenAI ApplicationsKey Dimensions of Data ReadinessThe Interconnected Nature of Data ReadinessManaging Prompts as Data AssetsThe Human Element: Roles in the Data Readiness JourneyData Scientists: The ExplorersML Engineers: Building the Bridge to ProductionData Engineers: Architecting the FoundationDevOps and SREs: Operationalizing the FoundationBusiness SMEs and Domain Leaders: The “Why” Behind the “What”Strategic Data Patterns: The Foundation for Reliable GenAI SystemsThe Unified Data and AI PlatformFrom RAG to Agentic RAG: The Evolution of a Data PatternTying it All Together: the Enterprise RAG Knowledge EngineData Readiness for Agent SystemsSecurity and Governance: Protecting Data Throughout the LLM LifecycleData Privacy FrameworkComprehensive GovernancePractical Data Readiness AssessmentLooking AheadLearning Labs
3. Building a Multimodal Agent with the Agent Development Kit (ADK)
From Zero to Agent in Seven LinesThe Simplest Thing That WorksThe Runtime Behind the SimplicityRunning Your First ConversationUnderstanding the LimitationsAdding Intelligence Through ToolsYour Agent’s First ToolTools Versus Subagents–A Practical Decision FrameworkState Management That Actually ScalesBuilding a Stateful Shopping CartUnderstanding the Three ScopesState Scope InteractionsMaking State Persist in ProductionBeyond Structured State: Semantic MemoryVertex AI Agent Engine Memory Bank: Learning from ConversationsImplementationExpanding to MultimodalMaking Our Agent SeeFrom Static Analysis to Live SupportBuilding Complete Interaction MemoryBuilding Production-Grade ToolsHandling Asynchronous OperationsEnsuring Safety with Human-in-the-LoopProduction Monitoring and Policy Enforcement with Callbacks and Plug-insLooking AheadLearning Labs
4. Orchestrating Intelligent Agent Teams
The Bottleneck of the Monolithic AgentConflicting InstructionsTool Selection ParalysisToken LimitationsMaintenance NightmareThe Solution: An Agent TeamThe Roadmap: From Local Teams to Distributed SystemsLocal TeamsThe Foundation: Agent HierarchyPattern 1: The Assembly Line (SequentialAgent)Pattern 2: The Independent Taskforce (ParallelAgent)Pattern 3: The Iterative Refiner (LoopAgent)Distributed CollaborationThe Organizational “Why”MCP: The Language of ToolsA2A: The Language of DelegationPutting It All Together: A Hybrid Agent TeamProduction RealitiesThe Trust Problem: Security Schemes in A2AThe Extension Problem: Evolving Agent CapabilitiesThe Visibility Problem: Distributed TracingThe Versioning Problem: Managing Agent EvolutionLooking AheadEdge and Embodied IntelligenceFrom Architecture to ExcellenceLearning Labs
5. Evaluation and Optimization Strategies
Tailoring Evaluation to Your LLM/Agent’s PurposeBeyond Basic FunctionalityKey Dimensions of EvaluationSetting the Bar for Production ExcellencePractical Evaluation StrategiesHuman-Centered EvaluationA/B Testing and Preference ScoringRed Teaming: Stress Testing for Safety and ReliabilityAutomated Evaluation: Scaling Feedback for Rapid ImprovementReference-Based Metrics for Text GenerationLimitations of Reference-Based EvaluationDomain-Specific and Task-Oriented MetricsMetrics for Agentic Systems and Tool UseOptimization StrategiesRefining PromptsElevating Agent PerformanceBeyond Prompt and Agent OptimizationsLooking AheadLearning Labs
6. Tuning and Infrastructure
The Tuning DecisionThe Fine-Tuning Decision FrameworkFine-Tuning Strategies: From Full Training to Efficient AdaptationsThe Real Cost of Fine-TuningImplementation ApproachesInfrastructure Questions EmergeThe Constraint You’ll Hit FirstPattern 1: The Waiting AcceleratorPattern 2: The Memory WallPattern 3: Maxed Out But Still SlowPattern 4: More GPUs = Worse PerformanceAccelerators: Matching Hardware to BottlenecksThe Decision FrameworkThe Practical DecisionMigration RealityStorage OptionsWhen Storage Becomes Your BottleneckThe Storage PatternServing and DeploymentConfiguration That MattersConnecting Models to AgentsAgent Deployment PlatformsAgent EngineCloud RunGKELooking AheadLearning Labs
7. MLOps for Production-Ready AI and Agentic Systems
From Ad Hoc to Systematic: The Current State of TeamsThe Evolution of MLOpsBuilding Reproducible Training PipelinesData Versioning and LineageExperiment TrackingModel Registry and GovernanceAutomated RetrainingComprehensive MonitoringAgent MonitoringTechnical MonitoringHallucination DetectionCI/CD for AI SystemsCloud BuildCloud DeploySecurity and Governance as FoundationSecurity Framework for AI AgentsModel Armor: A Key Security ComponentCost ManagementThe True Cost ModelCost Attribution StrategiesIntelligent Cost OperationsSpending ControlsLooking AheadLearning Labs
8. The AI and Agentic Maturity Framework
What Is the AI and Agentic Maturity Framework?The Maturity Dimensions and PhasesVision and Leadership (The “What” and the “Why” Dimension)Talent and Culture (The “Who” Dimension)Operational and Technical Practice (The “How” Dimension)How the Three Dimensions of AI and Agentic Maturity Can Work TogetherFrom Framework to Reality: What Are Teams Actually Building, and How?Technical ConversationsLeadership, Talent, and Culture ConversationsWhy and How a Platform Approach Can Accelerate an Organization’s AI and Agentic MaturityVertex AI PlatformLearning Labs
Conclusion

Appendix. Further Reading for Leaders
Index
About the Authors

Content preview from GenAI on Google Cloud

Chapter 6. Tuning and Infrastructure

You built a capable customer service agent in Chapters 3 and 4. It handles requests across text, images, and video, and routes complex queries to specialist agents when needed. You followed Chapter 5’s evaluation framework, measuring performance, iterating on prompts, and refining coordination patterns. The system works.

But as you prepare for production, new questions emerge. What if response times need to drop by 50%? What if the request volume scales to millions per day? What if your domain vocabulary—the specific terminology and patterns unique to your business—proves too specialized for the base model to handle reliably? When prompt engineering and agent design reach their limits, what comes next?

This chapter explores the deeper interventions that become necessary at scale. We’ll examine when fine-tuning justifies its costs, how to implement it efficiently, and how to build inference infrastructure that balances performance, cost, and operational complexity.

The Tuning Decision

These questions about latency, scale, and domain specialization don’t all have the same answer. Some point toward fine-tuning. Others don’t. To see why, consider a financial services client whose agent struggled with two problems. First, it missed fraud signals, failing to recognize when routine-sounding inquiries were actually red flags for account takeover. The client had tried detailed prompts describing fraud patterns, but even with regular updates, the model ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Generative AI on Google Cloud with LangChain

Publisher Resources

ISBN: 9798341623842Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

GenAI on Google Cloud

by Ayo Adedeji, Lavi Nigam, Sarita Joshi, Stephanie Gervasi

Chapter 6. Tuning and Infrastructure

The Tuning Decision

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.