May 2024
Intermediate to advanced
342 pages
8h 45m
English
In this chapter, we are going beyond LLMs, to introduce the concept of multimodality while building agents. We will see the logic behind the combination of foundation models in different AI domains – language, images, and audio – into one single agent that can adapt to a variety of tasks. By the end of this chapter, you will be able to build your own multimodal agent, providing it with the tools and LLMs needed to perform various AI tasks.
Throughout this chapter, we will cover the following topics:
To complete the ...