Skip to Content
The Self-Service Data Roadmap
book

The Self-Service Data Roadmap

by Sandeep Uttamchandani
September 2020
Beginner to intermediate
284 pages
7h 40m
English
O'Reilly Media, Inc.
Content preview from The Self-Service Data Roadmap

Chapter 13. Continuous Integration Service

So far, we have covered building the transformation logic to implement the insight and training of ML models. Typically, ML model pipelines evolve continuously with source schema changes, feature logic, dependent datasets, data processing configurations, model algorithms, model features, and configuration. These changes are made by teams of data users to either implement new product capabilities or improve the accuracy of the models. In traditional software engineering, code is constantly updated with multiple changes made daily across teams. To get ready for deploying ML models in production, this chapter covers details of continuous integration of ML pipelines, similar to traditional software engineering.

There are multiple pain points associated with continuous integration of ML pipelines. The first is holistically tracking ML pipeline experiments involving data, code, and configuration. These experiments can be considered feature branches with the distinction that a vast majority of these branches will never be integrated with the trunk. These experiments need to be tracked to pick the optimal configuration as well as for future debugging. Existing code-versioning tools like GitHub only track code changes. There is neither a standard place to store the results of training experiments nor an easy way to compare one experiment to another. Second, to verify the changes, the ML pipeline needs to be packaged for deploying in a test environment. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Hands-On Healthcare Data

Hands-On Healthcare Data

Andrew Nguyen
The Enterprise Data Catalog

The Enterprise Data Catalog

Ole Olesen-Bagneux
Data Governance: The Definitive Guide

Data Governance: The Definitive Guide

Evren Eryurek, Uri Gilad, Valliappa Lakshmanan, Anita Kibunguchy-Grant, Jessi Ashdown

Publisher Resources

ISBN: 9781492075240Errata Page