Chapter 6. Fine-Tuning Language Models

In Chapter 2, we explored how language models work and how to use them for different tasks, such as text generation and sequence classification. We saw that language models could be helpful in many tasks without further training, thanks to proper prompting and the zero-shot capabilities of these models. We also explored some of the hundreds of thousands of pre-trained models by the community. In this chapter, we’ll discuss how we can improve the performance of language models on specific tasks by fine-tuning them on our data.

While pre-trained models showcase remarkable capabilities, their general-purpose training may not be suited for certain tasks or domains. Fine-tuning is frequently used to tailor the model’s understanding to the nuances of their dataset or task. For instance, in the field of medical research, a language model pre-trained on general web text will not perform great out ...

Get Hands-On Generative AI with Transformers and Diffusion Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.