Chapter 7. High-Performance Modeling
In production scenarios, getting the best possible performance from your model is important for delivering fast response times and low costs, with low resource requirements. High-performance modeling becomes especially important when compute resource requirements are large, such as when dealing with large models and/or datasets, and when inference latency and/or cost requirements are challenging.
In this chapter, we’ll discuss how models can be accelerated using data and model parallelism. We’ll also look at high-performance modeling techniques such as distribution strategies, and high-performance ingestion pipelines such as TF Data. Finally, we’ll consider the rise of giant neural nets, and approaches for addressing the resulting need for efficient, scalable infrastructure in that context.
Distributed Training
When you start prototyping, training your model might be a fast and simple task, especially if you’re working with a small dataset. However, fully training a model can become very time-consuming. Datasets and model architectures in many domains are getting larger and larger. As the size of training datasets and models increases, models take longer and longer to train. And it’s not just the training time for each epoch; often the number of epochs for a model also increases as a result. Solving this kind of problem usually requires distributed training. Distributed training allows us to train huge models while speeding up training by ...
Get Machine Learning Production Systems now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.