Chapter 2. Machine Learning Pipelines
In one of my favorite episodes of The Simpsons, when Homer Simpson heard that bacon, ham, and pork chops all came from the same animal, he couldn’t believe it: “Yeah, right, Lisa, a wonderful, magical animal.” I had the same reaction when I asked ChatGPT 4.1 for a definition of an ML pipeline. It told me that an ML pipeline performs data collection, feature engineering, model training, model evaluation, model deployment, model monitoring, inference, and maintenance. “Yeah, right, GPT, a wonderful, magical monolithic ML pipeline,” I thought. It even claimed its ML pipeline was modular!
It’s no wonder that when I ask 10 different data scientists for a definition of an ML pipeline, I typically get 10 different answers. There is no agreement on what its inputs and outputs are. If a developer tells you they built their AI system using an ML pipeline, what information can you glean from that? In my opinion, the term ML pipeline, as it is currently used, could be “considered harmful” when communicating about building AI systems.1 In this book, we strive to be more rigorous. We describe AI systems in terms of concrete pipelines used to build them. We reserve the use of the term ML pipeline to describe any individual pipeline or group of pipelines in an AI system.
A pipeline is a computer program that has clearly defined inputs and outputs (that is, it has a well-defined interface) and runs either on a schedule or continuously. An ML pipeline is any ...