Chapter 6: Training and Tuning at Scale

Machine learning (ML) practitioners face multiple challenges when training and tuning models at scale. Scale challenges come in the form of high volumes of training data and increased model size and model architecture complexity. Additional challenges come from having to run a large number of tuning jobs to identify the right set of hyperparameters and keeping track of multiple experiments conducted with varying algorithms for a specific ML objective. Scale challenges lead to long training times, resource constraints, and increased costs. This can reduce the productivity of teams, and potentially create a bottleneck for ML projects.

Amazon SageMaker provides managed distributed training and tuning capabilities ...

Get Amazon SageMaker Best Practices now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.