Learn Amazon SageMaker

Book description

Quickly build and deploy machine learning models without managing infrastructure, and improve productivity using Amazon SageMaker’s capabilities such as Amazon SageMaker Studio, Autopilot, Experiments, Debugger, and Model Monitor

Key Features

  • Build, train, and deploy machine learning models quickly using Amazon SageMaker
  • Analyze, detect, and receive alerts relating to various business problems using machine learning algorithms and techniques
  • Improve productivity by training and fine-tuning machine learning models in production

Book Description

Amazon SageMaker enables you to quickly build, train, and deploy machine learning (ML) models at scale, without managing any infrastructure. It helps you focus on the ML problem at hand and deploy high-quality models by removing the heavy lifting typically involved in each step of the ML process. This book is a comprehensive guide for data scientists and ML developers who want to learn the ins and outs of Amazon SageMaker.

You’ll understand how to use various modules of SageMaker as a single toolset to solve the challenges faced in ML. As you progress, you’ll cover features such as AutoML, built-in algorithms and frameworks, and the option for writing your own code and algorithms to build ML models. Later, the book will show you how to integrate Amazon SageMaker with popular deep learning libraries such as TensorFlow and PyTorch to increase the capabilities of existing models. You’ll also learn to get the models to production faster with minimum effort and at a lower cost. Finally, you’ll explore how to use Amazon SageMaker Debugger to analyze, detect, and highlight problems to understand the current model state and improve model accuracy.

By the end of this Amazon book, you’ll be able to use Amazon SageMaker on the full spectrum of ML workflows, from experimentation, training, and monitoring to scaling, deployment, and automation.

What you will learn

  • Create and automate end-to-end machine learning workflows on Amazon Web Services (AWS)
  • Become well-versed with data annotation and preparation techniques
  • Use AutoML features to build and train machine learning models with AutoPilot
  • Create models using built-in algorithms and frameworks and your own code
  • Train computer vision and NLP models using real-world examples
  • Cover training techniques for scaling, model optimization, model debugging, and cost optimization
  • Automate deployment tasks in a variety of configurations using SDK and several automation tools

Who this book is for

This book is for software engineers, machine learning developers, data scientists, and AWS users who are new to using Amazon SageMaker and want to build high-quality machine learning models without worrying about infrastructure. Knowledge of AWS basics is required to grasp the concepts covered in this book more effectively. Some understanding of machine learning concepts and the Python programming language will also be beneficial.

Table of contents

  1. Learn Amazon SageMaker
  2. Why subscribe?
  3. Contributors
  4. About the author
  5. About the reviewers
  6. Packt is searching for authors like you
  7. Foreword
  8. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Reviews
  9. Section 1: Introduction to Amazon SageMaker
  10. Chapter 1: Introduction to Amazon SageMaker
    1. Technical requirements
    2. Exploring the capabilities of Amazon SageMaker
      1. The main capabilities of Amazon SageMaker
      2. The Amazon SageMaker API
    3. Demonstrating the strengths of Amazon SageMaker
      1. Solving Alice's problems
      2. Solving Bob's problems
    4. Setting up Amazon SageMaker on your local machine
      1. Installing the SageMaker SDK with virtualenv
      2. Installing the SageMaker SDK with Anaconda
      3. A word about AWS permissions
    5. Setting up an Amazon SageMaker notebook instance
    6. Setting up Amazon SageMaker Studio
      1. Onboarding to Amazon SageMaker Studio
      2. Onboarding with the quick start procedure
    7. Summary
  11. Chapter 2: Handling Data Preparation Techniques
    1. Technical requirements
    2. Discovering Amazon SageMaker Ground Truth
      1. Using workforces
      2. Creating a private workforce
      3. Uploading data for labeling
      4. Creating a labeling job
      5. Labeling images
      6. Labeling text
    3. Exploring Amazon SageMaker Processing
      1. Discovering the Amazon SageMaker Processing API
      2. Processing a dataset with scikit-learn
      3. Processing a dataset with your own code
    4. Processing data with other AWS services
      1. Amazon Elastic Map Reduce
      2. AWS Glue
      3. Amazon Athena
    5. Summary
  12. Section 2: Building and Training Models
  13. Chapter 3: AutoML with Amazon SageMaker Autopilot
    1. Technical requirements
    2. Discovering Amazon SageMaker Autopilot
      1. Analyzing data
      2. Feature engineering
      3. Model tuning
    3. Using SageMaker Autopilot in SageMaker Studio
      1. Launching a job
      2. Monitoring a job
      3. Comparing jobs
      4. Deploying and invoking a model
    4. Using the SageMaker Autopilot SDK
      1. Launching a job
      2. Monitoring a job
      3. Cleaning up
    5. Diving deep on SageMaker Autopilot
      1. The job artifacts
      2. The Data Exploration notebook
      3. The Candidate Generation notebook
    6. Summary
  14. Chapter 4: Training Machine Learning Models
    1. Technical requirements
    2. Discovering the built-in algorithms in Amazon SageMaker
      1. Supervised learning
      2. Unsupervised learning
      3. A word about scalability
    3. Training and deploying models with built-in algorithms
      1. Understanding the end-to-end workflow
      2. Using alternative workflows
      3. Using fully managed infrastructure
    4. Using the SageMaker SDK with built-in algorithms
      1. Preparing data
      2. Configuring a training job
      3. Launching a training job
      4. Deploying a model
      5. Cleaning up
    5. Working with more built-in algorithms
      1. Classification with XGBoost
      2. Recommendation with Factorization Machines
      3. Using Principal Component Analysis
      4. Detecting anomalies with Random Cut Forest
    6. Summary
  15. Chapter 5: Training Computer Vision Models
    1. Technical requirements
    2. Discovering the CV built-in algorithms in Amazon SageMaker
      1. Discovering the image classification algorithm
      2. Discovering the object detection algorithm
      3. Discovering the semantic segmentation algorithm
      4. Training with CV algorithms
    3. Preparing image datasets
      1. Working with image files
      2. Working with RecordIO files
      3. Working with SageMaker Ground Truth files
    4. Using the built-in CV algorithms
      1. Training an image classification model
      2. Fine-tuning an image classification model
      3. Training an object detection model
      4. Training a semantic segmentation model
    5. Summary
  16. Chapter 6: Training Natural Language Processing Models
    1. Technical requirements
    2. Discovering the NLP built-in algorithms in Amazon SageMaker
      1. Discovering the BlazingText algorithm
      2. Discovering the LDA algorithm
      3. Discovering the NTM algorithm
      4. Discovering the seq2seq algorithm
      5. Training with NLP algorithms
    3. Preparing natural language datasets
      1. Preparing data for classification with BlazingText
      2. Preparing data for classification with BlazingText, version 2
      3. Preparing data for word vectors with BlazingText
      4. Preparing data for topic modeling with LDA and NTM
      5. Using datasets labeled with SageMaker Ground Truth
    4. Using the built-in algorithms for NLP
      1. Classifying text with BlazingText
      2. Computing word vectors with BlazingText
      3. Using BlazingText models with FastText
      4. Modeling topics with LDA
      5. Modeling topics with NTM
    5. Summary
  17. Chapter 7: Extending Machine Learning Services Using Built-In Frameworks
    1. Technical requirements
    2. Discovering the built-in frameworks in Amazon SageMaker
      1. Running a first example
      2. Working with framework containers
      3. Training and deploying locally
      4. Training with script mode
      5. Understanding model deployment
      6. Managing dependencies
      7. Putting it all together
    3. Running your framework code on Amazon SageMaker
    4. Using the built-in frameworks
      1. Working with TensorFlow and Keras
      2. Working with PyTorch
      3. Working with Apache Spark
    5. Summary
  18. Chapter 8: Using Your Algorithms and Code
    1. Technical requirements
    2. Understanding how SageMaker invokes your code
    3. Using the SageMaker training toolkit with scikit-learn
    4. Building a fully custom container for scikit-learn
      1. Training with a fully custom container
      2. Deploying a fully custom container
    5. Building a fully custom container for R
      1. Coding with R and Plumber
      2. Building a custom container
      3. Training and deploying a custom container on SageMaker
    6. Training and deploying with XGBoost and MLflow
      1. Installing MLflow
      2. Training a model with MLflow
      3. Building a SageMaker container with MLflow
    7. Training and deploying with XGBoost and Sagify
      1. Installing Sagify
      2. Coding our model with Sagify
      3. Deploying a model locally with Sagify
      4. Deploying a model on SageMaker with Sagify
    8. Summary
  19. Section 3: Diving Deeper on Training
  20. Chapter 9: Scaling Your Training Jobs
    1. Technical requirements
    2. Understanding when and how to scale
      1. Understanding what scaling means
      2. Adapting training time to business requirements
      3. Right-sizing training infrastructure
      4. Deciding when to scale
      5. Deciding how to scale
      6. Scaling a BlazingText training job
      7. Scaling a Semantic Segmentation training job
      8. Solving training challenges
    3. Streaming datasets with pipe mode
      1. Using pipe mode with built-in algorithms
      2. Using pipe mode with other algorithms
      3. Training factorization machines with pipe mode
      4. Training Object Detection with pipe mode
    4. Using other storage services
      1. Working with SageMaker and Amazon EFS
      2. Working with SageMaker and Amazon FSx for Lustre
    5. Distributing training jobs
      1. Distributing training for built-in algorithms
      2. Distributing training for built-in frameworks
      3. Distributing training for custom containers
      4. Distributing training for Object Detection
    6. Training an Image Classification model on ImageNet
      1. Preparing the ImageNet dataset
      2. Defining our training job
      3. Training on ImageNet
      4. Examining results
    7. Summary
  21. Chapter 10: Advanced Training Techniques
    1. Technical requirements
    2. Optimizing training costs with Managed Spot Training
      1. Comparing costs
      2. Understanding spot instances
      3. Understanding Managed Spot Training
      4. Using Managed Spot Training with Object Detection
      5. Using Managed Spot Training and checkpointing with Keras
    3. Optimizing hyperparameters with Automatic Model Tuning
      1. Understanding Automatic Model Tuning
      2. Using Automatic Model Tuning with Object Detection
      3. Using Automatic Model Tuning with Keras
      4. Using Automatic Model Tuning for architecture search
      5. Tuning multiple algorithms
    4. Exploring models with SageMaker Debugger
      1. Debugging an XGBoost job
      2. Inspecting an XGBoost job
      3. Debugging and inspecting a Keras job
    5. Summary
  22. Section 4: Managing Models in Production
  23. Chapter 11: Deploying Machine Learning Models
    1. Technical requirements
    2. Examining model artifacts
      1. Examining artifacts for built-in algorithms
      2. Examining artifacts for built-in computer vision algorithms
      3. Examining artifacts for XGBoost
    3. Managing real-time endpoints
      1. Managing endpoints with the SageMaker SDK
      2. Managing endpoints with the boto3 SDK
    4. Deploying batch transformers
    5. Deploying inference pipelines
    6. Monitoring predictions with Amazon SageMaker Model Monitor
      1. Capturing data
      2. Creating a baseline
      3. Setting up a monitoring schedule
      4. Sending bad data
      5. Examining violation reports
    7. Deploying models to container services
      1. Training on SageMaker and deploying on Amazon Fargate
    8. Summary
  24. Chapter 12: Automating Machine Learning Workflows
    1. Technical requirements
    2. Automating with AWS CloudFormation
      1. Writing a template
      2. Deploying a model to a real-time endpoint
      3. Modifying a stack with a change set
      4. Adding a second production variant to the endpoint
      5. Implementing canary deployment
      6. Implementing blue-green deployment
    3. Automating with the AWS Cloud Development Kit
      1. Installing CDK
      2. Creating a CDK application
      3. Writing a CDK application
      4. Deploying a CDK application
    4. Automating with AWS Step Functions
      1. Setting up permissions
      2. Implementing our first workflow
      3. Adding parallel execution to a workflow
      4. Adding a Lambda function to a workflow
    5. Summary
  25. Chapter 13: Optimizing Prediction Cost and Performance
    1. Technical requirements
    2. Autoscaling an endpoint
    3. Deploying a multi-model endpoint
      1. Understanding multi-model endpoints
      2. Building a multi-model endpoint with Scikit-Learn
    4. Deploying a model with Amazon Elastic Inference
      1. Deploying a model with AWS
    5. Compiling models with Amazon SageMaker Neo
      1. Understanding Amazon Neo
      2. Compiling and deploying an image classification model on SageMaker
      3. Exploring models compiled with Neo
      4. Deploying an image classification model on a Raspberry Pi
      5. Deploying models on AWS Inferentia
    6. Building a cost optimization checklist
      1. Optimizing costs for data preparation
      2. Optimizing costs for experimentation
      3. Optimizing costs for model training
      4. Optimizing costs for model deployment
    7. Summary
  26. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Learn Amazon SageMaker
  • Author(s): Julien Simon
  • Release date: August 2020
  • Publisher(s): Packt Publishing
  • ISBN: 9781800208919