Chapter 10. Getting Ready for MLOps with Vertex AI

In Chapter 9, we developed a TensorFlow model in a Jupyter Notebook. We were able to train the model, deploy it to an endpoint, and get predictions from it from the notebook environment. While that worked for us during development, it is not a scalable workflow.

Taking a TensorFlow model that you trained in your Jupyter Notebook and deploying the SavedModel to Vertex AI doesn’t scale to hundreds of models and large teams. Retraining is going to be difficult because the ops team will have to set up all of the ops and monitoring and scheduling on top of something that is really clunky and totally nonminimal.

In order for a machine learning model to be placed into production, it needs to meet the following requirements:

  • The model should be under version control. Source code control systems such as git work much better with text files (such as .py files) than with mixtures of text and binaries (which is what .ipynb files are).

  • The entire process—from dataset creation to training to deployment—has to be driven by code. This is so that it is easy to automatically retrigger a training run using GitHub Actions or GitLab Continuous Integration whenever new changed code is checked in.

  • The entire process should be invokable from a single entry point, so that the retraining can be triggered by noncode changes such as the arrival of new data in a Cloud Storage bucket.

  • It should be easy to monitor the performance of models and endpoints ...

Get Data Science on the Google Cloud Platform, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.