Book description
Whether you're part of a small startup or a multinational corporation, this practical book shows data scientists, software and site reliability engineers, product managers, and business owners how to run and establish ML reliably, effectively, and accountably within your organization. You'll gain insight into everything from how to do model monitoring in production to how to run a well-tuned model development team in a product organization.
By applying an SRE mindset to machine learning, authors and engineering professionals Cathy Chen, Kranti Parisa, Niall Richard Murphy, D. Sculley, Todd Underwood, and featured guest authors show you how to run an efficient and reliable ML system. Whether you want to increase revenue, optimize decision making, solve problems, or understand and influence customer behavior, you'll learn how to perform day-to-day ML tasks while keeping the bigger picture in mind.
You'll examine:
- What ML is: how it functions and what it relies on
- Conceptual frameworks for understanding how ML "loops" work
- How effective productionization can make your ML systems easily monitorable, deployable, and operable
- Why ML systems make production troubleshooting more difficult, and how to compensate accordingly
- How ML, product, and production teams can communicate effectively
Publisher resources
Table of contents
- Foreword
- Preface
- 1. Introduction
- 2. Data Management Principles
- 3. Basic Introduction to Models
- 4. Feature and Training Data
- 5. Evaluating Model Validity and Quality
- 6. Fairness, Privacy, and Ethical ML Systems
-
7. Training Systems
- Requirements
- Basic Training System Implementation
-
General Reliability Principles
- Most Failures Will Not Be ML Failures
- Models Will Be Retrained
- Models Will Have Multiple Versions (at the Same Time!)
- Good Models Will Become Bad
- Data Will Be Unavailable
- Models Should Be Improvable
- Features Will Be Added and Changed
- Models Can Train Too Fast
- Resource Utilization Matters
- Utilization != Efficiency
- Outages Include Recovery
- Common Training Reliability Problems
- Structural Reliability
- Conclusion
- 8. Serving
- 9. Monitoring and Observability for Models
- 10. Continuous ML
- 11. Incident Response
- 12. How Product and ML Interact
- 13. Integrating ML into Your Organization
- 14. Practical ML Org Implementation Examples
- 15. Case Studies: MLOps in Practice
- Index
- About the Authors
Product information
- Title: Reliable Machine Learning
- Author(s):
- Release date: September 2022
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098106225
You might also like
book
Kubernetes in Action
Kubernetes in Action teaches you to use Kubernetes to deploy container-based distributed applications. You'll start with …
book
Fundamentals of Data Engineering
Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and …
book
AWS Certified Cloud Practitioner Exam Guide
Develop proficiency in AWS technologies and validate your skills by becoming an AWS Certified Cloud Practitioner …
book
Pandas in Action
Pandas in Action introduces Python-based data analysis using the amazing pandas library. You’ll learn to automate …