Chapter 19. Using Generative Models with Hugging Face Diffusers
Over the last few chapters, we have been looking at inference on generative models and primarily using LLMs (aka text-to-text models) to explore different scenarios. However, generative AI isn’t limited just to text-based models, and another important innovation is, of course, image generation (aka text-to-image). Most image generation models today are based on a process called diffusion, which inspires the name diffusers for the Hugging Face APIs used to create images from text prompts. In this chapter, we’ll explore how diffusion models work and how to get up and running with your own apps that can generate images from prompts.
What Are Diffusion Models?
By now, most of us have seen images that are AI created, and we’ve likely been amazed at how quickly they have grown from abstract, rough representations to near photoreal representations of what we asked for via a prompt. Because the models allow for longer prompts, with more detail, and as their training sets have grown, we’ve seen a near endless stream of improvements to what can be done with AI image generation.
But how does all of this work? It starts with the idea of diffusion.
You can start this process by creating a dataset of images and their associated noise. Consider Figure 19-1.
Figure 19-1. Noising an image
Then, once you have a set of images you’ve ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access