Chapter 5. Fine-Tuning for Specific Video Tasks
Learning objective: In this chapter you’ll look into fine-tuning methods leveraging the CogVideo model from Chapter 1. CogVideo provides an accessible and comprehensive fine-tuning framework developed by Tsinghua University, supporting both LoRA and supervised fine-tuning (SFT) with the necessary tooling required to specialize our model.
In Chapter 4 you focused on gaining intuition via training the Latte model; you configured experiments, implemented training loops, and distributed training across GPUs. This chapter bridges theory and practice. You’ll explore low-rank adaptation (LoRA) for parameter-efficient fine-tuning (PEFT) that requires nominal 16–24 GB VRAM, and supervised fine-tuning (SFT) with DeepSpeed integration. Each technique provided includes discussion of environmental requirements, optimization strategies, and issues you may encounter.
Note
Fine-tuning requirements vary dramatically by method and model type:
- LoRA
-
16 GB (RTX ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access