10

Building Multimodal Applications with LLMs

In this chapter, we are going beyond LLMs, to introduce the concept of multimodality while building agents. We will see the logic behind the combination of foundation models in different AI domains – language, images, and audio – into one single agent that can adapt to a variety of tasks. By the end of this chapter, you will be able to build your own multimodal agent, providing it with the tools and LLMs needed to perform various AI tasks.

Throughout this chapter, we will cover the following topics:

  • Introduction to multimodality and large multimodal models (LMMs)
  • Examples of emerging LMMs
  • How to build a multimodal agent with single-modal LLMs using LangChain

Technical requirements

To complete the ...

Get Building LLM Powered Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.