Overview
As users increasingly expect AI to interact as naturally as humans, engineers must move beyond static prompts to building systems that perceive, reason, and act instantly, mimicking human interaction more closely. Multimodal, Real-Time AI Agent Systems takes you from agent fundamentals to architecting advanced, bidirectional, multimodal, multi-agent systems, focusing on the difficult leap from proof of concept to production. You'll start by building agents directly with foundation models to understand core components and then master modern agent frameworks that simplify and scale implementation.
Written by industry practitioners, this book connects agentic concepts with cutting-edge standards and protocols of interoperability. You'll learn to build enterprise-grade agent platforms that enforce scalable AgentOps, rigorous evaluation, and extreme security measures required for live streaming interactions.
Through practical multimodal agent examples, you'll understand how to:
- Design scalable, low-latency architectures for single and multi-agent systems
- Engineer the complete lifecycle of live multimodal streaming, using backend-centric architectures to minimize frontend complexity
- Implement unified protocols like the Model Context Protocol and Agent-to-Agent for tool use and agent discovery
- Apply operational best practices for security, testing, and scaling to support thousands of concurrent live agents
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access