Table of Contents
Preface
Section 1: Introduction to Pachyderm and Reproducible Data Science
Chapter 1: The Problem of Data Reproducibility
Why is reproducibility important?
What is a model?
The main principles of reproducibility
The reproducibility crisis in science
Data fishing
Better reproducibility in science research guidelines
Common practices to improve reproducibility
Demystifying MLOps
Types of data science platforms
End-to-end platforms
Pluggable solutions
Data ingestion tools
Data transformation tools
Model serving tools
Data monitoring tools
Putting it all together
Explaining ethical AI
Trustworthy AI
Summary
Further reading
Chapter 2: Pachyderm Basics
Reviewing Pachyderm architecture
Why can't I use Git for my data pipelines?
Get Reproducible Data Science with Pachyderm now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.