Chapter 2. Machine Learning Pipelines

In 1968, Edsger Dijkstra published an influential letter in the Communications of the ACM entitled “Go To Statement Considered Harmful” to highlight the excessive use of the GOTO statement in programming languages.1 In 2024, the term “machine learning pipeline” is often used as a catch-all term to describe how to productionize ML models. However, there is currently widespread confusion about what a ML pipeline is and what it is not. What are the inputs and outputs to a ML pipeline? If somebody says they built their ML system using a ML pipeline what information can you glean from that? As such, the term ML pipelines, as it is currently used, could be “considered harmful” when communicating about building ML systems. Instead, we will strive to describe ML systems in terms of the actual pipelines used to build it. We provide a rigorous definition of different ML pipelines and describe ...

Get Building Machine Learning Systems with a Feature Store now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.