Getting Started with Amazon SageMaker Studio

Book description

Build production-grade machine learning models with Amazon SageMaker Studio, the first integrated development environment in the cloud, using real-life machine learning examples and code

Key Features

  • Understand the ML lifecycle in the cloud and its development on Amazon SageMaker Studio
  • Learn to apply SageMaker features in SageMaker Studio for ML use cases
  • Scale and operationalize the ML lifecycle effectively using SageMaker Studio

Book Description

Amazon SageMaker Studio is the first integrated development environment (IDE) for machine learning (ML) and is designed to integrate ML workflows: data preparation, feature engineering, statistical bias detection, automated machine learning (AutoML), training, hosting, ML explainability, monitoring, and MLOps in one environment.

In this book, you'll start by exploring the features available in Amazon SageMaker Studio to analyze data, develop ML models, and productionize models to meet your goals. As you progress, you will learn how these features work together to address common challenges when building ML models in production. After that, you'll understand how to effectively scale and operationalize the ML life cycle using SageMaker Studio.

By the end of this book, you'll have learned ML best practices regarding Amazon SageMaker Studio, as well as being able to improve productivity in the ML development life cycle and build and deploy models easily for your ML use cases.

What you will learn

  • Explore the ML development life cycle in the cloud
  • Understand SageMaker Studio features and the user interface
  • Build a dataset with clicks and host a feature store for ML
  • Train ML models with ease and scale
  • Create ML models and solutions with little code
  • Host ML models in the cloud with optimal cloud resources
  • Ensure optimal model performance with model monitoring
  • Apply governance and operational excellence to ML projects

Who this book is for

This book is for data scientists and machine learning engineers who are looking to become well-versed with Amazon SageMaker Studio and gain hands-on machine learning experience to handle every step in the ML lifecycle, including building data as well as training and hosting models. Although basic knowledge of machine learning and data science is necessary, no previous knowledge of SageMaker Studio and cloud experience is required.

Table of contents

  1. Contributors
    1. About the author
    2. About the reviewers
  2. Preface
    1. Who this book is for
    2. What this book covers
    3. Download the example code files
    4. Download the color images
    5. Conventions used
    6. Get in touch
    7. Reviews
    8. Share Your Thoughts
  3. Part 1 – Introduction to Machine Learning on Amazon SageMaker Studio
  4. Chapter 1: Machine Learning and Its Life Cycle in the Cloud
    1. Technical requirements
    2. Understanding ML and its life cycle
      1. An ML life cycle
    3. Building ML in the cloud
    4. Exploring AWS essentials for ML
      1. Compute
      2. Storage
      3. Database and analytics
      4. Security
    5. Setting up an AWS environment
    6. Summary
  5. Chapter 2: Introducing Amazon SageMaker Studio
    1. Technical requirements
    2. Introducing SageMaker Studio and its components
      1. Prepare
      2. Build
      3. Training and tuning
      4. Deploy
      5. MLOps
    3. Setting up SageMaker Studio
      1. Setting up a domain
    4. Walking through the SageMaker Studio UI
      1. The main work area
      2. The sidebar
      3. "Hello world!" in SageMaker Studio
    5. Demystifying SageMaker Studio notebooks, instances, and kernels
    6. Using the SageMaker Python SDK
    7. Summary
  6. Part 2 – End-to-End Machine Learning Life Cycle with SageMaker Studio
  7. Chapter 3: Data Preparation with SageMaker Data Wrangler
    1. Technical requirements
    2. Getting started with SageMaker Data Wrangler for customer churn prediction
      1. Preparing the use case
      2. Launching SageMaker Data Wrangler
    3. Importing data from sources
      1. Importing from S3
      2. Importing from Athena
      3. Editing the data type
      4. Joining tables
    4. Exploring data with visualization
      1. Understanding the frequency distribution with a histogram
      2. Scatter plots
      3. Previewing ML model performance with Quick Model
      4. Revealing target leakage
      5. Creating custom visualizations
    5. Applying transformation
      1. Exploring performance while wrangling
    6. Exporting data for ML training
    7. Summary
  8. Chapter 4: Building a Feature Repository with SageMaker Feature Store
    1. Technical requirements
    2. Understanding the concept of a feature store
      1. Understanding an online store
      2. Understanding an offline store
    3. Getting started with SageMaker Feature Store
      1. Creating a feature group
      2. Ingesting data to SageMaker Feature Store
      3. Ingesting from SageMaker Data Wrangler
    4. Accessing features from SageMaker Feature Store
      1. Accessing a feature group in the Studio UI
      2. Accessing an offline store – building a dataset for analysis and training
      3. Accessing online store – low-latency feature retrieval
    5. Summary
  9. Chapter 5: Building and Training ML Models with SageMaker Studio IDE
    1. Technical requirements
    2. Training models with SageMaker's built-in algorithms
      1. Training an NLP model easily
      2. Managing training jobs with SageMaker Experiments
    3. Training with code written in popular frameworks
      1. TensorFlow
      2. PyTorch
      3. Hugging Face
      4. MXNet
      5. Scikit-learn
    4. Developing and collaborating using SageMaker Notebook
    5. Summary
  10. Chapter 6: Detecting ML Bias and Explaining Models with SageMaker Clarify
    1. Technical requirements
    2. Understanding bias, fairness in ML, and ML explainability
    3. Detecting bias in ML
      1. Detecting pretraining bias
      2. Mitigating bias and training a model
      3. Detecting post-training bias
    4. Explaining ML models using SHAP values
    5. Summary
  11. Chapter 7: Hosting ML Models in the Cloud: Best Practices
    1. Technical requirements
    2. Deploying models in the cloud after training
    3. Inferencing in batches with batch transform
    4. Hosting real-time endpoints
    5. Optimizing your model deployment
      1. Hosting multi-model endpoints to save costs
      2. Optimizing instance type and autoscaling with load testing
    6. Summary
  12. Chapter 8: Jumpstarting ML with SageMaker JumpStart and Autopilot
    1. Technical requirements
    2. Launching a SageMaker JumpStart solution
      1. Solution catalog for industries
      2. Deploying the Product Defect Detection solution
    3. SageMaker JumpStart model zoo
      1. Model collection
      2. Deploying a model
      3. Fine-tuning a model
    4. Creating a high-quality model with SageMaker Autopilot
      1. Wine quality prediction
      2. Setting up an Autopilot job
      3. Understanding an Autopilot job
      4. Evaluating Autopilot models
    5. Summary
    6. Further reading
  13. Part 3 – The Production and Operation of Machine Learning with SageMaker Studio
  14. Chapter 9: Training ML Models at Scale in SageMaker Studio
    1. Technical requirements
    2. Performing distributed training in SageMaker Studio
      1. Understanding the concept of distributed training
      2. The data parallel library with TensorFlow
      3. Model parallelism with PyTorch
    3. Monitoring model training and compute resources with SageMaker Debugger
    4. Managing long-running jobs with checkpointing and spot training
    5. Summary
  15. Chapter 10: Monitoring ML Models in Production with SageMaker Model Monitor
    1. Technical requirements
    2. Understanding drift in ML
    3. Monitoring data and performance drift in SageMaker Studio
      1. Training and hosting a model
      2. Creating inference traffic and ground truth
      3. Creating a data quality monitor
      4. Creating a model quality monitor
    4. Reviewing model monitoring results in SageMaker Studio
    5. Summary
  16. Chapter 11: Operationalize ML Projects with SageMaker Projects, Pipelines, and Model Registry
    1. Technical requirements
    2. Understanding ML operations and CI/CD
    3. Creating a SageMaker project
    4. Orchestrating an ML pipeline with SageMaker Pipelines
    5. Running CI/CD in SageMaker Studio
    6. Summary
    7. Why subscribe?
  17. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts

Product information

  • Title: Getting Started with Amazon SageMaker Studio
  • Author(s): Michael Hsieh
  • Release date: March 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781801070157