Machine Learning Engineering with Python - Second Edition

Book description

Transform your machine learning projects into successful deployments with this practical guide on how to build and scale solutions that solve real-world problems Includes a new chapter on generative AI and large language models (LLMs) and building a pipeline that leverages LLMs using LangChain

Key Features

  • This second edition delves deeper into key machine learning topics, CI/CD, and system design
  • Explore core MLOps practices, such as model management and performance monitoring
  • Build end-to-end examples of deployable ML microservices and pipelines using AWS and open-source tools

Book Description

The Second Edition of Machine Learning Engineering with Python is the practical guide that MLOps and ML engineers need to build solutions to real-world problems. It will provide you with the skills you need to stay ahead in this rapidly evolving field.

The book takes an examples-based approach to help you develop your skills and covers the technical concepts, implementation patterns, and development methodologies you need. You'll explore the key steps of the ML development lifecycle and create your own standardized "model factory" for training and retraining of models. You'll learn to employ concepts like CI/CD and how to detect different types of drift.

Get hands-on with the latest in deployment architectures and discover methods for scaling up your solutions. This edition goes deeper in all aspects of ML engineering and MLOps, with emphasis on the latest open-source and cloud-based technologies. This includes a completely revamped approach to advanced pipelining and orchestration techniques.

With a new chapter on deep learning, generative AI, and LLMOps, you will learn to use tools like LangChain, PyTorch, and Hugging Face to leverage LLMs for supercharged analysis. You will explore AI assistants like GitHub Copilot to become more productive, then dive deep into the engineering considerations of working with deep learning.

What you will learn

  • Plan and manage end-to-end ML development projects
  • Explore deep learning, LLMs, and LLMOps to leverage generative AI
  • Use Python to package your ML tools and scale up your solutions
  • Get to grips with Apache Spark, Kubernetes, and Ray
  • Build and run ML pipelines with Apache Airflow, ZenML, and Kubeflow
  • Detect drift and build retraining mechanisms into your solutions
  • Improve error handling with control flows and vulnerability scanning
  • Host and build ML microservices and batch processes running on AWS

Who this book is for

This book is designed for MLOps and ML engineers, data scientists, and software developers who want to build robust solutions that use machine learning to solve real-world problems. If you’re not a developer but want to manage or understand the product lifecycle of these systems, you’ll also find this book useful. It assumes a basic knowledge of machine learning concepts and intermediate programming experience in Python. With its focus on practical skills and real-world examples, this book is an essential resource for anyone looking to advance their machine learning engineering career.

Table of contents

  1. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Get in touch
  2. Introduction to ML Engineering
    1. Technical requirements
    2. Defining a taxonomy of data disciplines
      1. Data scientist
      2. ML engineer
      3. ML operations engineer
      4. Data engineer
    3. Working as an effective team
    4. ML engineering in the real world
    5. What does an ML solution look like?
      1. Why Python?
    6. High-level ML system design
      1. Example 1: Batch anomaly detection service
      2. Example 2: Forecasting API
      3. Example 3: Classification pipeline
    7. Summary
  3. The Machine Learning Development Process
    1. Technical requirements
    2. Setting up our tools
      1. Setting up an AWS account
    3. Concept to solution in four steps
      1. Comparing this to CRISP-DM
      2. Discover
        1. Using user stories
      3. Play
      4. Develop
        1. Selecting a software development methodology
        2. Package management (conda and pip)
        3. Poetry
        4. Code version control
        5. Git strategies
        6. Model version control
      5. Deploy
        1. Knowing your deployment options
        2. Understanding DevOps and MLOps
        3. Building our first CI/CD example with GitHub Actions
        4. Continuous model performance testing
        5. Continuous model training
    4. Summary
  4. From Model to Model Factory
    1. Technical requirements
    2. Defining the model factory
    3. Learning about learning
      1. Defining the target
      2. Cutting your losses
      3. Preparing the data
    4. Engineering features for machine learning
      1. Engineering categorical features
      2. Engineering numerical features
    5. Designing your training system
      1. Training system design options
      2. Train-run
      3. Train-persist
    6. Retraining required
      1. Detecting data drift
      2. Detecting concept drift
      3. Setting the limits
      4. Diagnosing the drift
      5. Remediating the drift
      6. Other tools for monitoring
      7. Automating training
      8. Hierarchies of automation
      9. Optimizing hyperparameters
        1. Hyperopt
        2. Optuna
      10. AutoML
        1. auto-sklearn
        2. AutoKeras
    7. Persisting your models
    8. Building the model factory with pipelines
      1. Scikit-learn pipelines
      2. Spark ML pipelines
    9. Summary
  5. Packaging Up
    1. Technical requirements
    2. Writing good Python
      1. Recapping the basics
      2. Tips and tricks
      3. Adhering to standards
      4. Writing good PySpark
    3. Choosing a style
      1. Object-oriented programming
      2. Functional programming
    4. Packaging your code
      1. Why package?
      2. Selecting use cases for packaging
      3. Designing your package
    5. Building your package
      1. Managing your environment with Makefiles
      2. Getting all poetic with Poetry
    6. Testing, logging, securing, and error handling
      1. Testing
      2. Securing your solutions
      3. Analyzing your own code for security issues
      4. Analyzing dependencies for security issues
      5. Logging
      6. Error handling
    7. Not reinventing the wheel
    8. Summary
  6. Deployment Patterns and Tools
    1. Technical requirements
    2. Architecting systems
      1. Building with principles
    3. Exploring some standard ML patterns
      1. Swimming in data lakes
      2. Microservices
      3. Event-based designs
      4. Batching
    4. Containerizing
    5. Hosting your own microservice on AWS
      1. Pushing to ECR
      2. Hosting on ECS
    6. Building general pipelines with Airflow
      1. Airflow
        1. Airflow on AWS
        2. Revisiting CI/CD for Airflow
    7. Building advanced ML pipelines
      1. Finding your ZenML
      2. Going with the Kubeflow
    8. Selecting your deployment strategy
    9. Summary
  7. Scaling Up
    1. Technical requirements
    2. Scaling with Spark
      1. Spark tips and tricks
      2. Spark on the cloud
        1. AWS EMR example
    3. Spinning up serverless infrastructure
    4. Containerizing at scale with Kubernetes
    5. Scaling with Ray
      1. Getting started with Ray for ML
        1. Scaling your compute for Ray
        2. Scaling your serving layer with Ray
    6. Designing systems at scale
    7. Summary
  8. Deep Learning, Generative AI, and LLMOps
    1. Going deep with deep learning
      1. Getting started with PyTorch
      2. Scaling and taking deep learning into production
      3. Fine-tuning and transfer learning
    2. Living it large with LLMs
      1. Understanding LLMs
      2. Consuming LLMs via API
      3. Coding with LLMs
    3. Building the future with LLMOps
      1. Validating LLMs
      2. PromptOps
    4. Summary
  9. Building an Example ML Microservice
    1. Technical requirements
    2. Understanding the forecasting problem
    3. Designing our forecasting service
    4. Selecting the tools
    5. Training at scale
    6. Serving the models with FastAPI
      1. Response and request schemas
      2. Managing models in your microservice
      3. Pulling it all together
    7. Containerizing and deploying to Kubernetes
      1. Containerizing the application
      2. Scaling up with Kubernetes
      3. Deployment strategies
    8. Summary
  10. Building an Extract, Transform, Machine Learning Use Case
    1. Technical requirements
    2. Understanding the batch processing problem
    3. Designing an ETML solution
    4. Selecting the tools
      1. Interfaces and storage
      2. Scaling of models
      3. Scheduling of ETML pipelines
    5. Executing the build
      1. Building an ETML pipeline with advanced Airflow features
    6. Summary
  11. Other Books You May Enjoy
  12. Index

Product information

  • Title: Machine Learning Engineering with Python - Second Edition
  • Author(s): Andrew P. McMahon
  • Release date: August 2023
  • Publisher(s): Packt Publishing
  • ISBN: 9781837631964