Skip to Content
Build a Text-to-Image Generator (from Scratch)
book

Build a Text-to-Image Generator (from Scratch)

by MARK LIU
December 2025
Beginner to intermediate
360 pages
10h 48m
English
Manning Publications

Overview

Build your own vision transformer and diffusion models for text-to-image generation–from scratch!

Build a Text-to-Image Generator (from Scratch) takes you step-by-step through creating your own AI models that can generate images from text. You’ll explore two methods of image generation—vision transformers and diffusion models—and learn vital AI development techniques as you go.

Build a Text-to-Image Generator (from Scratch) teaches you how to:

  • Build and train models to generate high resolution images based on text descriptions
  • Edit an existing image based on text prompts
  • Build and train a model to add captions to images
  • Build and train a vision transformer to classify images
  • Fine-tune LLMs for downstream tasks such as classification, text or image generation
  • Better differentiate real images from deepfakes

Build a Text-to-Image Generator (from Scratch) dives into the powerful models behind AI image generators. The best way to learn is to build something from scratch, and in this book you’ll build your very own diffusion model and vision transformer. As you work through each stage of development, you’ll develop an understanding of how these models can be customized, applied, and integrated for impressive multimodal AI.

About the Technology
AI-generated images appear everywhere from high-end advertising to casual social media feeds. Text-to-image tools like Dall-e, Midjourney, and Flux make it easy to create AI art, but how do they work? In this book, you’ll find out by building your own text-to-image generator!

About the Book
Build a Text-to-Image Generator (from Scratch) explores both transformer-based image generation and diffusion models. You’ll work hands-on to build a pair of simple generation models that can classify images, automatically add captions, reconstruct images, and enhance existing graphics. Author Mark Liu guides you every step of the way with clear explanations, informative diagrams, and eye-opening examples you can build on your own laptop.

What's Inside
  • Build a vision transformer to classify images
  • Edit images using text prompts
  • Fine-tune image models


About the Reader
Requires basic knowledge of generative AI models and intermediate Python skills.

About the Author
Mark Liu is the founding director of the Master of Science in Finance program at the University of Kentucky. He is also the author of Learn Generative AI with PyTorch.

Quotes
A practical and readable introduction with working code and clear explanations.
- Andrey Lukyanenko, Meta

Empowers you to unlock creativity at the intersection of text and imagery.
- Bojan Tunguz, Tabul.AI

Amazingly comprehensive, hype-free, hands-on, and code-rich guidebook.
- Kirk Borne, Data Leadership Group

Successfully brings together the theoretical foundations and practical applications, from transformers to diffusion models.
- Raymond Cheung, Parity Technologies

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Build Financial Software with Generative AI (From Scratch)

Build Financial Software with Generative AI (From Scratch)

Mark Brouwer, Christopher Kardell
Data Structures & Algorithms in Python

Data Structures & Algorithms in Python

John Canning, Alan Broder, Robert Lafore

Publisher Resources

ISBN: 9781633435421Publisher SupportOtherPublisher WebsitePurchase Link