MLOps Engineering at Scale

Book description

Dodge costly and time-consuming infrastructure tasks, and rapidly bring your machine learning models to production with MLOps and pre-built serverless tools!

In MLOps Engineering at Scale you will learn:

  • Extracting, transforming, and loading datasets
  • Querying datasets with SQL
  • Understanding automatic differentiation in PyTorch
  • Deploying model training pipelines as a service endpoint
  • Monitoring and managing your pipeline’s life cycle
  • Measuring performance improvements

MLOps Engineering at Scale shows you how to put machine learning into production efficiently by using pre-built services from AWS and other cloud vendors. You’ll learn how to rapidly create flexible and scalable machine learning systems without laboring over time-consuming operational tasks or taking on the costly overhead of physical hardware. Following a real-world use case for calculating taxi fares, you will engineer an MLOps pipeline for a PyTorch model using AWS server-less capabilities.

About the Technology
A production-ready machine learning system includes efficient data pipelines, integrated monitoring, and means to scale up and down based on demand. Using cloud-based services to implement ML infrastructure reduces development time and lowers hosting costs. Serverless MLOps eliminates the need to build and maintain custom infrastructure, so you can concentrate on your data, models, and algorithms.

About the Book
MLOps Engineering at Scale teaches you how to implement efficient machine learning systems using pre-built services from AWS and other cloud vendors. This easy-to-follow book guides you step-by-step as you set up your serverless ML infrastructure, even if you’ve never used a cloud platform before. You’ll also explore tools like PyTorch Lightning, Optuna, and MLFlow that make it easy to build pipelines and scale your deep learning models in production.

What's Inside
  • Reduce or eliminate ML infrastructure management
  • Learn state-of-the-art MLOps tools like PyTorch Lightning and MLFlow
  • Deploy training pipelines as a service endpoint
  • Monitor and manage your pipeline’s life cycle
  • Measure performance improvements


About the Reader
Readers need to know Python, SQL, and the basics of machine learning. No cloud experience required.

About the Author
Carl Osipov implemented his first neural net in 2000 and has worked on deep learning and machine learning at Google and IBM.

Quotes
There is a dire need in the market for practical know-how on the industrialized use of machine learning in real world applications...which Carl Osipov’s book elegantly and comprehensively presents.
- Babak Hodjat, CTO-AI, Cognizant

Excellent resource for learning cloud-native end-to-end machine learning engineering.
- Manish Jain, Infosys

A very timely and necessary book for any serious data scientist.
- Tiklu Ganguly, Mazik Tech Solutions

A great guide to modern ML applications at scale in the cloud.
- Dinesh Ghanta, Oracle

Table of contents

  1. MLOps Engineering at Scale
  2. Copyright
  3. contents
  4. front matter
    1. preface
    2. acknowledgments
    3. about this book
      1. Who should read this book
      2. How this book is organized: A road map
      3. About the code
      4. liveBook discussion forum
    4. about the author
    5. about the cover illustration
  5. Part 1 Mastering the data set
  6. 1 Introduction to serverless machine learning
    1. 1.1 What is a machine learning platform?
    2. 1.2 Challenges when designing a machine learning platform
    3. 1.3 Public clouds for machine learning platforms
    4. 1.4 What is serverless machine learning?
    5. 1.5 Why serverless machine learning?
      1. 1.5.1 Serverless vs. IaaS and PaaS
      2. 1.5.2 Serverless machine learning life cycle
    6. 1.6 Who is this book for?
      1. 1.6.1 What you can get out of this book
    7. 1.7 How does this book teach?
    8. 1.8 When is this book not for you?
    9. 1.9 Conclusions
    10. Summary
  7. 2 Getting started with the data set
    1. 2.1 Introducing the Washington, DC taxi rides data set
      1. 2.1.1 What is the business use case?
      2. 2.1.2 What are the business rules?
      3. 2.1.3 What is the schema for the business service?
      4. 2.1.4 What are the options for implementing the business service?
      5. 2.1.5 What data assets are available for the business service?
      6. 2.1.6 Downloading and unzipping the data set
    2. 2.2 Starting with object storage for the data set
      1. 2.2.1 Understanding object storage vs. filesystems
      2. 2.2.2 Authenticating with Amazon Web Services
      3. 2.2.3 Creating a serverless object storage bucket
    3. 2.3 Discovering the schema for the data set
      1. 2.3.1 Introducing AWS Glue
      2. 2.3.2 Authorizing the crawler to access your objects
      3. 2.3.3 Using a crawler to discover the data schema
    4. 2.4 Migrating to columnar storage for more efficient analytics
      1. 2.4.1 Introducing column-oriented data formats for analytics
      2. 2.4.2 Migrating to a column-oriented data format
    5. Summary
  8. 3 Exploring and preparing the data set
    1. 3.1 Getting started with interactive querying
      1. 3.1.1 Choosing the right use case for interactive querying
      2. 3.1.2 Introducing AWS Athena
      3. 3.1.3 Preparing a sample data set
      4. 3.1.4 Interactive querying using Athena from a browser
      5. 3.1.5 Interactive querying using a sample data set
      6. 3.1.6 Querying the DC taxi data set
    2. 3.2 Getting started with data quality
      1. 3.2.1 From “garbage in, garbage out” to data quality
      2. 3.2.2 Before starting with data quality
      3. 3.2.3 Normative principles for data quality
    3. 3.3 Applying VACUUM to the DC taxi data
      1. 3.3.1 Enforcing the schema to ensure valid values
      2. 3.3.2 Cleaning up invalid fare amounts
      3. 3.3.3 Improving the accuracy
    4. 3.4 Implementing VACUUM in a PySpark job
    5. Summary
  9. 4 More exploratory data analysis and data preparation
    1. 4.1 Getting started with data sampling
      1. 4.1.1 Exploring the summary statistics of the cleaned-up data set
      2. 4.1.2 Choosing the right sample size for the test data set
      3. 4.1.3 Exploring the statistics of alternative sample sizes
      4. 4.1.4 Using a PySpark job to sample the test set
    2. Summary
  10. Part 2 PyTorch for serverless machine learning
  11. 5 Introducing PyTorch: Tensor basics
    1. 5.1 Getting started with tensors
    2. 5.2 Getting started with PyTorch tensor creation operations
    3. 5.3 Creating PyTorch tensors of pseudorandom and interval values
    4. 5.4 PyTorch tensor operations and broadcasting
    5. 5.5 PyTorch tensors vs. native Python lists
    6. Summary
  12. 6 Core PyTorch: Autograd, optimizers, and utilities
    1. 6.1 Understanding the basics of autodiff
    2. 6.2 Linear regression using PyTorch automatic differentiation
    3. 6.3 Transitioning to PyTorch optimizers for gradient descent
    4. 6.4 Getting started with data set batches for gradient descent
    5. 6.5 Data set batches with PyTorch Dataset and DataLoader
    6. 6.6 Dataset and DataLoader classes for gradient descent with batches
    7. Summary
  13. 7 Serverless machine learning at scale
    1. 7.1 What if a single node is enough for my machine learning model?
    2. 7.2 Using IterableDataset and ObjectStorageDataset
    3. 7.3 Gradient descent with out-of-memory data sets
    4. 7.4 Faster PyTorch tensor operations with GPUs
    5. 7.5 Scaling up to use GPU cores
    6. Summary
  14. 8 Scaling out with distributed training
    1. 8.1 What if the training data set does not fit in memory?
      1. 8.1.1 Illustrating gradient accumulation
      2. 8.1.2 Preparing a sample model and data set
      3. 8.1.3 Understanding gradient descent using out-of-memory data shards
    2. 8.2 Parameter server approach to gradient accumulation
    3. 8.3 Introducing logical ring-based gradient descent
    4. 8.4 Understanding ring-based distributed gradient descent
    5. 8.5 Phase 1: Reduce-scatter
    6. 8.6 Phase 2: All-gather
    7. Summary
  15. Part 3 Serverless machine learning pipeline
  16. 9 Feature selection
    1. 9.1 Guiding principles for feature selection
      1. 9.1.1 Related to the label
      2. 9.1.2 Recorded before inference time
      3. 9.1.3 Supported by abundant examples
      4. 9.1.4 Expressed as a number with a meaningful scale
      5. 9.1.5 Based on expert insights about the project
    2. 9.2 Feature selection case studies
    3. 9.3 Feature selection using guiding principles
      1. 9.3.1 Related to the label
      2. 9.3.2 Recorded before inference time
      3. 9.3.3 Supported by abundant examples
      4. 9.3.4 Numeric with meaningful magnitude
      5. 9.3.5 Bring expert insight to the problem
    4. 9.4 Selecting features for the DC taxi data set
    5. Summary
  17. 10 Adopting PyTorch Lightning
    1. 10.1 Understanding PyTorch Lightning
      1. 10.1.1 Converting PyTorch model training to PyTorch Lightning
      2. 10.1.2 Enabling test and reporting for a trained model
      3. 10.1.3 Enabling validation during model training
    2. Summary
  18. 11 Hyperparameter optimization
    1. 11.1 Hyperparameter optimization with Optuna
      1. 11.1.1 Understanding loguniform hyperparameters
      2. 11.1.2 Using categorical and log-uniform hyperparameters
    2. 11.2 Neural network layers configuration as a hyperparameter
    3. 11.3 Experimenting with the batch normalization hyperparameter
      1. 11.3.1 Using Optuna study for hyperparameter optimization
      2. 11.3.2 Visualizing an HPO study in Optuna
    4. Summary
  19. 12 Machine learning pipeline
    1. 12.1 Describing the machine learning pipeline
    2. 12.2 Enabling PyTorch-distributed training support with Kaen
      1. 12.2.1 Understanding PyTorch-distributed training settings
    3. 12.3 Unit testing model training in a local Kaen container
    4. 12.4 Hyperparameter optimization with Optuna
      1. 12.4.1 Enabling MLFlow support
      2. 12.4.2 Using HPO for DcTaxiModel in a local Kaen provider
      3. 12.4.3 Training with the Kaen AWS provider
    5. Summary
  20. Appendix A. Introduction to machine learning
    1. A.1 Why machine learning?
    2. A.2 Machine learning at first glance
    3. A.3 Machine learning with structured data sets
    4. A.4 Regression with structured data sets
    5. A.5 Classification with structured data sets
    6. A.6 Training a supervised machine learning model
  21. Appendix B. Getting started with Docker
    1. B.1 Getting started with Docker
    2. B.2 Building a custom image
    3. B.3 Sharing your custom image with the world
  22. index

Product information

  • Title: MLOps Engineering at Scale
  • Author(s): Carl Osipov
  • Release date: February 2022
  • Publisher(s): Manning Publications
  • ISBN: 9781617297762