Learn Amazon SageMaker - Second Edition

Book description

Swiftly build and deploy machine learning models without managing infrastructure and boost productivity using the latest Amazon SageMaker capabilities such as Studio, Autopilot, Data Wrangler, Pipelines, and Feature Store

Key Features

  • Build, train, and deploy machine learning models quickly using Amazon SageMaker
  • Optimize the accuracy, cost, and fairness of your models
  • Create and automate end-to-end machine learning workflows on Amazon Web Services (AWS)

Book Description

Amazon SageMaker enables you to quickly build, train, and deploy machine learning models at scale without managing any infrastructure. It helps you focus on the machine learning problem at hand and deploy high-quality models by eliminating the heavy lifting typically involved in each step of the ML process. This second edition will help data scientists and ML developers to explore new features such as SageMaker Data Wrangler, Pipelines, Clarify, Feature Store, and much more.

You'll start by learning how to use various capabilities of SageMaker as a single toolset to solve ML challenges and progress to cover features such as AutoML, built-in algorithms and frameworks, and writing your own code and algorithms to build ML models. The book will then show you how to integrate Amazon SageMaker with popular deep learning libraries, such as TensorFlow and PyTorch, to extend the capabilities of existing models. You'll also see how automating your workflows can help you get to production faster with minimum effort and at a lower cost. Finally, you'll explore SageMaker Debugger and SageMaker Model Monitor to detect quality issues in training and production.

By the end of this Amazon book, you'll be able to use Amazon SageMaker on the full spectrum of ML workflows, from experimentation, training, and monitoring to scaling, deployment, and automation.

What you will learn

  • Become well-versed with data annotation and preparation techniques
  • Use AutoML features to build and train machine learning models with AutoPilot
  • Create models using built-in algorithms and frameworks and your own code
  • Train computer vision and natural language processing (NLP) models using real-world examples
  • Cover training techniques for scaling, model optimization, model debugging, and cost optimization
  • Automate deployment tasks in a variety of configurations using SDK and several automation tools

Who this book is for

This book is for software engineers, machine learning developers, data scientists, and AWS users who are new to using Amazon SageMaker and want to build high-quality machine learning models without worrying about infrastructure. Knowledge of AWS basics is required to grasp the concepts covered in this book more effectively. A solid understanding of machine learning concepts and the Python programming language will also be beneficial.

Table of contents

  1. Learn Amazon SageMaker Second Edition
  2. Contributors
  3. About the author
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Share your thoughts
  6. Section 1: Introduction to Amazon SageMaker
  7. Chapter 1: Introducing Amazon SageMaker
    1. Technical requirements
    2. Exploring the capabilities of Amazon SageMaker
      1. The main capabilities of Amazon SageMaker
      2. The Amazon SageMaker API
    3. Setting up Amazon SageMaker on your local machine
      1. Installing the SageMaker SDK with virtualenv
      2. Installing the SageMaker SDK with Anaconda
      3. A word about AWS permissions
    4. Setting up Amazon SageMaker Studio
      1. Onboarding to Amazon SageMaker Studio
      2. Onboarding with the quick start procedure
    5. Deploying one-click solutions and models with Amazon SageMaker JumpStart
      1. Deploying a solution
      2. Deploying a model
      3. Fine-tuning a model
    6. Summary
  8. Chapter 2: Handling Data Preparation Techniques
    1. Technical requirements
    2. Labeling data with Amazon SageMaker Ground Truth
      1. Using workforces
      2. Creating a private workforce
      3. Uploading data for labeling
      4. Creating a labeling job
      5. Labeling images
      6. Labeling text
    3. Transforming data with Amazon SageMaker Data Wrangler
      1. Loading a dataset in SageMaker Data Wrangler
      2. Transforming a dataset in SageMaker Data Wrangler
      3. Exporting a SageMaker Data Wrangler pipeline
    4. Running batch jobs with Amazon SageMaker Processing
      1. Discovering the Amazon SageMaker Processing API
      2. Processing a dataset with scikit-learn
      3. Processing a dataset with your own code
    5. Summary
  9. Section 2: Building and Training Models
  10. Chapter 3: AutoML with Amazon SageMaker Autopilot
    1. Technical requirements
    2. Discovering Amazon SageMaker Autopilot
      1. Analyzing data
      2. Feature engineering
      3. Model tuning
    3. Using Amazon SageMaker Autopilot in SageMaker Studio
      1. Launching a job
      2. Monitoring a job
      3. Comparing jobs
      4. Deploying and invoking a model
    4. Using the SageMaker Autopilot SDK
      1. Launching a job
      2. Monitoring a job
      3. Cleaning up
    5. Diving deep on SageMaker Autopilot
      1. The job artifacts
      2. The data exploration notebook
      3. The candidate generation notebook
    6. Summary
  11. Chapter 4: Training Machine Learning Models
    1. Technical requirements
    2. Discovering the built-in algorithms in Amazon SageMaker
      1. Supervised learning
      2. Unsupervised learning
      3. A word about scalability
    3. Training and deploying models with built-in algorithms
      1. Understanding the end-to-end workflow
      2. Using alternative workflows
      3. Using fully managed infrastructure
    4. Using the SageMaker SDK with built-in algorithms
      1. Preparing data
      2. Configuring a training job
      3. Launching a training job
      4. Deploying a model
      5. Cleaning up
    5. Working with more built-in algorithms
      1. Regression with XGBoost
      2. Recommendation with Factorization Machines
      3. Using Principal Component Analysis
      4. Detecting anomalies with Random Cut Forest
    6. Summary
  12. Chapter 5: Training CV Models
    1. Technical requirements
    2. Discovering the CV built-in algorithms in Amazon SageMaker
      1. Discovering the image classification algorithm
      2. Discovering the object detection algorithm
      3. Discovering the semantic segmentation algorithm
      4. Training with CV algorithms
    3. Preparing image datasets
      1. Working with image files
      2. Working with RecordIO files
      3. Working with SageMaker Ground Truth files
    4. Using the built-in CV algorithms
      1. Training an image classification model
      2. Fine-tuning an image classification model
      3. Training an object detection model
      4. Training a semantic segmentation model
    5. Summary
  13. Chapter 6: Training Natural Language Processing Models
    1. Technical requirements
    2. Discovering the NLP built-in algorithms in Amazon SageMaker
      1. Discovering the BlazingText algorithm
      2. Discovering the LDA algorithm
      3. Discovering the NTM algorithm
      4. Discovering the seq2sea algorithm
      5. Training with NLP algorithms
    3. Preparing natural language datasets
      1. Preparing data for classification with BlazingText
      2. Preparing data for classification with BlazingText, version 2
      3. Preparing data for word vectors with BlazingText
      4. Preparing data for topic modeling with LDA and NTM
      5. Using datasets labeled with SageMaker Ground Truth
    4. Using the built-in algorithms for NLP
      1. Classifying text with BlazingText
      2. Computing word vectors with BlazingText
      3. Using BlazingText models with FastText
      4. Modeling topics with LDA
      5. Modeling topics with NTM
    5. Summary
  14. Chapter 7: Extending Machine Learning Services Using Built-In Frameworks
    1. Technical requirements
    2. Discovering the built-in frameworks in Amazon SageMaker
      1. Running a first example with XGBoost
      2. Working with framework containers
      3. Training and deploying locally
      4. Training with script mode
      5. Understanding model deployment
      6. Managing dependencies
      7. Putting it all together
    3. Running your framework code on Amazon SageMaker
    4. Using the built-in frameworks
      1. Working with TensorFlow and Keras
      2. Working with PyTorch
      3. Working with Hugging Face
      4. Working with Apache Spark
    5. Summary
  15. Chapter 8: Using Your Algorithms and Code
    1. Technical requirements
    2. Understanding how SageMaker invokes your code
    3. Customizing an existing framework container
      1. Setting up your build environment on EC2
      2. Building training and inference containers
    4. Using the SageMaker Training Toolkit with scikit-learn
    5. Building a fully custom container for scikit-learn
      1. Training with a fully custom container
      2. Deploying a fully custom container
    6. Building a fully custom container for R
      1. Coding with R and plumber
      2. Building a custom container
      3. Training and deploying a custom container on SageMaker
    7. Training and deploying with your own code on MLflow
      1. Installing MLflow
      2. Training a model with MLflow
      3. Building a SageMaker container with MLflow
    8. Building a fully custom container for SageMaker Processing
    9. Summary
  16. Section 3: Diving Deeper into Training
  17. Chapter 9: Scaling Your Training Jobs
    1. Technical requirements
    2. Understanding when and how to scale
      1. Understanding what scaling means
      2. Adapting training time to business requirements
      3. Right-sizing training infrastructure
      4. Deciding when to scale
      5. Deciding how to scale
      6. Scaling a BlazingText training job
    3. Monitoring and profiling training jobs with Amazon SageMaker Debugger
      1. Viewing monitoring and profiling information in SageMaker Studio
      2. Enabling profiling in SageMaker Debugger
      3. Solving training challenges
    4. Streaming datasets with pipe mode
      1. Using pipe mode with built-in algorithms
      2. Using pipe mode with other algorithms and frameworks
      3. Simplifying data loading with MLIO
      4. Training factorization machines with pipe mode
    5. Distributing training jobs
      1. Understanding data parallelism and model parallelism
      2. Distributing training for built-in algorithms
      3. Distributing training for built-in frameworks
      4. Distributing training for custom containers
    6. Scaling an image classification model on ImageNet
      1. Preparing the ImageNet dataset
      2. Defining our training job
      3. Training on ImageNet
      4. Updating batch size
      5. Adding more instances
      6. Summing things up
    7. Training with the SageMaker data and model parallel libraries
      1. Training on TensorFlow with SageMaker DDP
      2. Training on Hugging Face with SageMaker DDP
      3. Training on Hugging Face with SageMaker DMP
    8. Using other storage services
      1. Working with SageMaker and Amazon EFS
      2. Working with SageMaker and Amazon FSx for Lustre
    9. Summary
  18. Chapter 10: Advanced Training Techniques
    1. Technical requirements
    2. Optimizing training costs with managed spot training
      1. Comparing costs
      2. Understanding Amazon EC2 Spot Instances
      3. Understanding managed spot training
      4. Using managed spot training with object detection
      5. Using managed spot training and checkpointing with Keras
    3. Optimizing hyperparameters with automatic model tuning
      1. Understanding automatic model tuning
      2. Using automatic model tuning with object detection
      3. Using automatic model tuning with Keras
      4. Using automatic model tuning for architecture search
    4. Exploring models with SageMaker Debugger
      1. Debugging an XGBoost job
      2. Inspecting an XGBoost job
      3. Debugging and inspecting a Keras job
    5. Managing features and building datasets with SageMaker Feature Store
      1. Engineering features with SageMaker Processing
      2. Creating a feature group
      3. Ingesting features
      4. Querying features to build a dataset
      5. Exploring other capabilities of SageMaker Feature Store
    6. Detecting bias in datasets and explaining predictions with SageMaker Clarify
      1. Configuring a bias analysis with SageMaker Clarify
      2. Running a bias analysis
      3. Analyzing bias metrics
      4. Running an explainability analysis
      5. Mitigating bias
    7. Summary
  19. Section 4: Managing Models in Production
  20. Chapter 11: Deploying Machine Learning Models
    1. Technical requirements
    2. Examining model artifacts and exporting models
      1. Examining and exporting built-in models
      2. Examining and exporting built-in CV models
      3. Examining and exporting XGBoost models
      4. Examining and exporting scikit-learn models
      5. Examining and exporting TensorFlow models
      6. Examining and exporting Hugging Face models
    3. Deploying models on real-time endpoints
      1. Managing endpoints with the SageMaker SDK
      2. Managing endpoints with the boto3 SDK
    4. Deploying models on batch transformers
    5. Deploying models on inference pipelines
    6. Monitoring prediction quality with Amazon SageMaker Model Monitor
      1. Capturing data
      2. Creating a baseline
      3. Setting up a monitoring schedule
      4. Sending bad data
      5. Examining violation reports
    7. Deploying models to container services
      1. Training on SageMaker and deploying on Amazon Fargate
    8. Summary
  21. Chapter 12: Automating Machine Learning Workflows
    1. Technical requirements
    2. Automating with AWS CloudFormation
      1. Writing a template
      2. Deploying a model to a real-time endpoint
      3. Modifying a stack with a change set
      4. Adding a second production variant to the endpoint
      5. Implementing canary deployment
      6. Implementing blue-green deployment
    3. Automating with AWS CDK
      1. Installing the CDK
      2. Creating a CDK application
      3. Writing a CDK application
      4. Deploying a CDK application
    4. Building end-to-end workflows with AWS Step Functions
      1. Setting up permissions
      2. Implementing our first workflow
      3. Adding parallel execution to a workflow
      4. Adding a Lambda function to a workflow
    5. Building end-to-end workflows with Amazon SageMaker Pipelines
      1. Defining workflow parameters
      2. Processing the dataset with SageMaker Processing
      3. Ingesting the dataset in SageMaker Feature Store with SageMaker Processing
      4. Building a dataset with Amazon Athena and SageMaker Processing
      5. Training a model
      6. Creating and registering a model in SageMaker Pipelines
      7. Creating a pipeline
      8. Running a pipeline
      9. Deploying a model from the model registry
    6. Summary
  22. Chapter 13: Optimizing Prediction Cost and Performance
    1. Technical requirements
    2. Autoscaling an endpoint
    3. Deploying a multi-model endpoint
      1. Understanding multi-model endpoints
      2. Building a multi-model endpoint with scikit-learn
    4. Deploying a model with Amazon Elastic Inference
      1. Deploying a model with Amazon Elastic Inference
    5. Compiling models with Amazon SageMaker Neo
      1. Understanding Amazon SageMaker Neo
      2. Compiling and deploying an image classification model on SageMaker
      3. Exploring models compiled with Neo
      4. Deploying an image classification model on a Raspberry Pi
      5. Deploying models on AWS Inferentia
    6. Building a cost optimization checklist
      1. Optimizing costs for data preparation
      2. Optimizing costs for experimentation
      3. Optimizing costs for model training
      4. Optimizing costs for model deployment
    7. Summary
    8. Why subscribe?
  23. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share your thoughts

Product information

  • Title: Learn Amazon SageMaker - Second Edition
  • Author(s): Julien Simon
  • Release date: November 2021
  • Publisher(s): Packt Publishing
  • ISBN: 9781801817950