Chapter 9: Scaling Your Training Jobs

In the four previous chapters, you learned how to train models with built-in algorithms, frameworks, or your own code.

In this chapter, you'll learn how to scale training jobs, allowing them to train on larger datasets while keeping the training time and cost under control. We'll start by discussing when and how to take scaling decisions, thanks to monitoring information and simple guidelines. Then, we'll look at pipe mode and distributed training, two key techniques for scaling. We'll also discuss storage alternatives to S3 for large-scale training. Finally, we'll launch a large training job on the ImageNet dataset.

We'll cover the following topics:

Understanding when and how to scale
Streaming datasets ...

Get Learn Amazon SageMaker now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Learn Amazon SageMaker by Julien Simon

Chapter 9: Scaling Your Training Jobs

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly