Chapter 4. Training Video Generation Models
Learning objective: In this chapter, you’ll build on your implementation of the Latte architecture and learn how to create progressive training pipelines that transform your model into a production-ready video generation system.
In Chapter 3, you implemented the complete Latte architecture, from VAE compression and patch embedding through spatial and temporal attention blocks to the diffusion process and text conditioning. Now we’ll transform that architecture into a functioning video generation model through a systematic training loop.
Training video generation models requires more than just feeding data through your architecture. As mentioned in previous chapters, the temporal dimension introduces unique challenges: datasets are massive, memory constraints are severe, and optimization is delicate. Building on Chapter 3, and leveraging Latte’s training approach as our guide, we’ll build a progressive pipeline that handles these challenges elegantly. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access