Chapter 1. Building Machine Learning Systems
Imagine you have been tasked with producing a financial forecast for the upcoming financial year. You decide to use machine learning (ML), as there is a lot of available data, but, not unexpectedly, the data is spread across many different places—in spreadsheets and many different tables in the data warehouse. You have been working for several years at the same organization, and this is not the first time you have been given this task. Every year to date, the final output of your model has been a PowerPoint presentation showing the financial projections. Each year, you trained a new model, your model made only one prediction, and you were finished with it. Each year, you started effectively from scratch. You had to find the data sources (again), re-request access to the data to create the features for your model, and then dig out the Jupyter notebook from last year and update it with new data and improvements to your model.
This year, however, you realize that it may be worth investing the time in building the scaffolding for this project so that you have less work to do next year. So instead of delivering a PowerPoint, you decide to build a dashboard. Instead of requesting one-off access to the data, you build feature pipelines that extract the historical data from its source(s) and compute the features (and labels) used in your model. You have an insight that the feature pipelines can be used to do two things: compute both the historical ...