Preface
AI is a wide and deep field. If you’ve never trained a model, it can feel like you need a PhD just to begin. If you have trained a model, building a machine learning (ML) system can feel like you need to first become both a data engineer and a Kubernetes or cloud expert.
You may already have some experience in ML or AI. Maybe you trained a model on a static dataset. Or you may have learned about large language models (LLMs) through crafting a prompt such that you successfully accomplished a task. But to create real value from AI, you need to move from static datasets and static prompts to dynamic data and context engineering. When you train a model, you need a system that will make many predictions with it, not just predictions on the static dataset you downloaded. When you AI-enable an application, you don’t have to hardwire the same responses for all users. You can personalize the AI by providing fresh and relevant context information at request time.
ML and AI systems create the most value when they work with dynamic data. Pipelines are key to this. You need pipelines to transform the dynamic data from your data sources into a format that can be used for anything from training your model, to making predictions, to providing context information for your LLM.
In this book, we will define ML systems as sequences of pipelines. They transform data progressively from data sources until it is used as input to a model for training or inference (making predictions). Pipelines ...