Chapter 6: Training and Tuning at Scale

Machine learning (ML) practitioners face multiple challenges when training and tuning models at scale. Scale challenges come in the form of high volumes of training data and increased model size and model architecture complexity. Additional challenges come from having to run a large number of tuning jobs to identify the right set of hyperparameters and keeping track of multiple experiments conducted with varying algorithms for a specific ML objective. Scale challenges lead to long training times, resource constraints, and increased costs. This can reduce the productivity of teams, and potentially create a bottleneck for ML projects.

Amazon SageMaker provides managed distributed training and tuning capabilities ...

Get Amazon SageMaker Best Practices now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.