Chapter 6. Parameter-Efficient Fine-Tuning

As we discussed in previous chapters, training generative models is computationally expensive. Adapting models to your domain through full fine-tuning requires memory not just to store the model, but also various other parameters that are required during the training process. In contrast to full fine-tuning, parameter-efficient fine-tuning (PEFT) provides a set of techniques allowing you to fine-tune your models while utilizing less compute resources.

There are a variety of PEFT techniques and categories explored in a paper on scaling.1 The techniques vary in implementation, but in general, each focuses on freezing all or most of the model’s original parameters and extending or replacing model layers by training an additional, much smaller, set of parameters. The most commonly used techniques fall into the additive and reparameterization categories.

Additive techniques, such as prompt tuning, augment the model by fine-tuning and adding extra parameters or layers to the pretrained model. Reparameterization techniques, such as Low-Rank Adaptation (LoRA), allow for adaptation using low-rank representations to reduce the number of training parameters and compute resources required to fine-tune.

In this chapter, you’ll learn about a few specific PEFT techniques that can be applied to generative models, including prompt tuning, LoRA, and QLoRA. This chapter focuses on key concepts illustrated through large language model (LLM) examples; ...

Get Generative AI on AWS now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.