Chapter 6. Model Resource Management Techniques

The compute, storage, and I/O systems that your model requires will determine how much it will cost to put your model into production and maintain it during its entire lifetime. In this chapter, we’ll take a look at some important techniques that can help us manage model resource requirements. We’ll focus on three key areas that are the primary ways to optimize models in both traditional ML and generative AI (GenAI):

  • Dimensionality reduction

  • Quantizing model parameters and pruning model graphs

  • Knowledge distillation to capture knowledge contained in large models

Dimensionality Reduction: Dimensionality Effect on Performance

We’ll begin by discussing dimensionality and how it affects our model’s performance and resource requirements.

In the not-so-distant past, data generation and, to some extent, data storage were a lot more costly than they are today. Back then, a lot of domain experts would carefully consider which features or variables to measure before designing their experiments and feature transforms. As a consequence, datasets were expected to be well designed and to potentially contain only a small number of relevant features.

Today data science tends to be more about integrating everything end to end. Generating and storing data is becoming faster, easier, and less expensive to do. So there’s a tendency for people to measure everything they can and to include ever more complex feature transformations. As a result, ...

Get Machine Learning Production Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.