Chapter 5. Model Training and Optimization
In the last chapter, we explored the world of data like a child running through a field of fallen leaves. We were delighted, but did not quite know why. Now, let us find some deeper meaning. Let us grab all that data knowledge and use it for good. Let us train some models.
In this chapter, we will demystify the process of model training and walk step by step through how a vision language model actually learns.
By the end, you will be able to
-
train a baby model using single-sample updates
-
scale it up to batches and see how the training dynamics change
-
understand how images and instructions are packed together for efficient large-scale training
We will touch the full stack: VLM architecture, pre-training, data packing, inference, and a handful of practical training tricks. Buckle up, you are about to start a life long obsession with lines going down.
Learning Objective:
Develop best practices for designing and training vision language models. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access