Chapter 9: Scaling Your Training Jobs
In the four previous chapters, you learned how to train models with built-in algorithms, frameworks, or your own code.
In this chapter, you'll learn how to scale training jobs, allowing them to train on larger datasets while keeping the training time and cost under control. We'll start by discussing when and how to take scaling decisions, thanks to monitoring information and simple guidelines. Then, we'll look at pipe mode and distributed training, two key techniques for scaling. We'll also discuss storage alternatives to S3 for large-scale training. Finally, we'll launch a large training job on the ImageNet dataset.
We'll cover the following topics:
- Understanding when and how to scale
- Streaming datasets ...
Get Learn Amazon SageMaker now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.