Chapter 3. Building a Multimodal Agent with the Agent Development Kit (ADK)
In Chapter 1, we explored what makes AI agents compelling: their ability to perceive, reason, and act autonomously across complex tasks. In Chapter 2, we laid the data foundations these agents require to function reliably. Now comes the crucial question: how do we actually build them?
Not every problem requires an agent. If your use case needs simple tool selection based on user queries, or deterministic RAG retrieval, you don’t need the complexity of agents. These linear, stateless patterns work well for many applications.
But when your system needs to maintain context across interactions, reason about multistep solutions, self-correct when approaches fail, or proactively pursue goals, you need true agents—systems that work through problems step by step, adapting their approach based on what they learn along the way.
Building such agents well—making them reliable enough to handle production workloads, trustworthy enough for sensitive operations, and functional enough to solve real problems—can be surprisingly difficult.
The root challenge to agent development is maintaining coherence across the entire perception-reasoning-action loop (Figure 1-3). Context, state, and intent need to flow naturally from each interaction to the next. Yet in practice, information gets lost between tool calls. Errors cascade through conversations. State vanishes when sessions restart. Many frameworks leave you to figure this ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access