Chapter 8. Creative Applications of Text-to-Image Models

This chapter presents creative applications that leverage text-to-image models and increase their capabilities beyond just using text to control generation. We will start with the most basic applications and then move on to more advanced ones.

Image to Image

Even though generative text-to-image diffusion models like Stable Diffusion can produce images from text from a fully noised image, as you learned in Chapters 4 and 5, it is possible to start from an already existing image instead of a fully noised image. That is, add some noise to an initial image and have the model modify it partially by denoising it. This process is called image to image, as an image is transformed into another image based on how much it is noised and based on the text prompt.

With the diffusers library, we can load an image-to-image pipeline to load the class. As an example, let’s explore how to use SDXL for this task. Here are the main differences:

  • We use the StableDiffusionXLImg2ImgPipeline rather than the usual StableDiffusionXLPipeline.

  • We pass both a prompt and an initial image to the pipeline.

We can use either the stabilityai/stable-diffusion-xl-base-1.0 or the stabilityai/stable-diffusion-xl-refiner-1.0 model for applying our image-to-image refinements. The base model is recommended when you want to stylize your image or create new context from what is there. The refiner model, which specializes in working out fine details for the ...

Get Hands-On Generative AI with Transformers and Diffusion Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.