April 2025
Beginner to intermediate
396 pages
7h 54m
English
In this chapter, we focus on the visual capabilities of ChatGPT, ranging from the traditional image generation with DALL-E to the more complex design and formatting activities embedded in the model.
Visual capabilities in ChatGPT have been improving dramatically over the past months as we are now entering the era of multimodality. In fact, ChatGPT can now not only generate images from natural language descriptions but also reason about multimodal data and solve complex queries. This multimodal thinking brings ChatGPT closer to the way our brains process the reality around them, which is mainly made of visual input.
Throughout this chapter, we will cover the following topics:
Read now
Unlock full access