Chapter 7. High-Performance Modeling

In production scenarios, getting the best possible performance from your model is important for delivering fast response times and low costs, with low resource requirements. High-performance modeling becomes especially important when compute resource requirements are large, such as when dealing with large models and/or datasets, and when inference latency and/or cost requirements are challenging.

In this chapter, we’ll discuss how models can be accelerated using data and model parallelism. We’ll also look at high-performance modeling techniques such as distribution strategies, and high-performance ingestion pipelines such as TF Data. Finally, we’ll consider the rise of giant neural nets, and approaches for addressing the resulting need for efficient, scalable infrastructure in that context.

Distributed Training

When you start prototyping, training your model might be a fast and simple task, especially if you’re working with a small dataset. However, fully training a model can become very time-consuming. Datasets and model architectures in many domains are getting larger and larger. As the size of training datasets and models increases, models take longer and longer to train. And it’s not just the training time for each epoch; often the number of epochs for a model also increases as a result. Solving this kind of problem usually requires distributed training. Distributed training allows us to train huge models while speeding up training by ...

Get Machine Learning Production Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Machine Learning Production Systems by Robert Crowe, Hannes Hapke, Emily Caveness, Di Zhu

Chapter 7. High-Performance Modeling

Distributed Training

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly