Chapter 1. Delivering Business Value Through ML Governance

The last decade has brought a dramatic boom of machine learning (ML) in both academia and enterprise. Companies raced to build data science departments and bring the promises of artificial intelligence (AI) into their decision making and products.

However, ML remained (and for some, remains) fundamentally misunderstood. Not long after companies began their foray into the realm of ML, they began to experience significant roadblocks to driving value and delivering ML projects. In 2015, Google released the now-famous paper “Hidden Technical Debt in Machine Learning Systems.”1 The paper outlined the common challenges data science groups faced with technical debt, DevOps, and governance of their ML systems.

Organizations hired data scientists in spades and started to generate algorithms. However, there were no existing operational pipelines capable of delivering models to production. This created a bottleneck that began to compound under the growing weight of new algorithms with nowhere to go. AutoML and other ease-of-use frameworks have further commoditized ML to the point that companies can now train hundreds of algorithms with the click of a button. Without a scalable framework to deliver and support models in production, the exponential explosion of ML models creates more problems than it solves.

Companies were investing in ML, but lack of consideration for the operational challenges needed to scale was significantly inhibiting their ability to deliver. Algorithmia’s “2021 Enterprise Trends in Machine Learning” report found that the time required to deploy a model is increasing, with 64% of all organizations taking a month or longer. In fact, the 2020 Gartner AI in Organizations survey showed that only 53% of ML models successfully make it into production.

The reality is that the actual ML model makes up only a small portion of a project. Even if the model is trained in a day, months can be wasted in meetings and development across business units trying to create a secure and compliant process for ML models where none currently exists. Even when individual business units succeed in model development and deployment, operational concerns such as security and compliance are often an afterthought. The scattering of incomplete “shadow infrastructure” drains resources and poses potential risk to the organization. A complete machine learning lifecycle encompasses everything from the development of the model to infrastructure and operations (Figure 1-1).

Figure 1-1. The ML lifecycle

The open-source space is meeting the technical needs of the individual data scientist but is failing to deliver the complexity of integrations required to meet the needs of the business. As a result, the number of machine learning operations (MLOps) vendor solutions is exploding. MLOps is now recognized as a critical capability by organizations delivering ML, and organizations’ unique governance needs for ML are at the core of what drives the definition of that capability.

The key challenge with all “ops” is that the problem isn’t just about technology, but process. Individual ML projects might successfully create “AI,” but expend all their budget reinventing the wheel on ops and governance. There is yet to exist a clear set of ML principles and best practices broadly applicable to a number of business verticals. There is a need for a longstanding, comprehensive framework that enables organizations to effectively drive value with ML by successfully implementing ML governance.

In this report we will investigate the state of ML governance, and more importantly we will define the framework for ML governance. The purpose of ML is to unlock the value of your data, and the purpose of MLOps is to unlock the value of your ML. The framework for ML governance is not a comprehensive strategy to just deliver ML, it is a comprehensive strategy to deliver business value with ML. We have finally reached the point of maturity in ML where the focus isn’t the academic or developer but the organization itself.

The Current State of ML Governance

MLOps is the discipline of ML model delivery. It is a set of tools and processes for delivering ML at scale that involves all technical and process-oriented portions of the ML lifecycle. Much like the software development analogue DevOps, MLOps seeks to improve the ML lifecycle by automating reproducible steps and reducing the time it takes to bring models into a production-ready state. It is not just a step in the process but the process itself. MLOps is present all the way from the high-level workflows of your organization, down to the low-level technical implementation of your pipeline from the end of development onward (Figure 1-2).

Beyond the process and technical components is the oversight of that process. ML governance is the management, control, and visibility of your MLOps. It is about both the functional and nonfunctional requirements of your end-to-end ML workflow. For example, MLOps is all about tangible features like continuous integration/continuous delivery (CI/CD), model pipelines, and tooling. ML governance, on the other hand, is about the “abilities”: visibility, explainability, auditability, and other more abstract but essential requirements of a successful end-to-end ML lifecycle. These capabilities work together to mitigate risk, reduce delivery time, and provide finer-grained controls for your ML lifecycle. MLOps without ML governance is like having a TV without a remote control. ML governance is present across the entirety of the ML lifecycle and is wider reaching than MLOps (Figure 1-3).

Figure 1-2. MLOps in the ML lifecycle
Figure 1-3. MLOps and ML governance in the ML lifecycle

For a lot of companies, the potential value generated by ML is no longer speculative. According to Algorithmia’s “2021 Enterprise Trends in Machine Learning” report, organizations are dramatically increasing their investments in ML. However, they’re also struggling to scale and get that value out of those investments. Most notably, ML governance is a top issue with 56% of organizations listing it as a concern.

Many organizations are also dealing with significant technical debt, and the lack of a governance strategy is only compounding these issues. With 67% of organizations required to abide by multiple regulations (ISO, HIPAA, PCI, GDPR, etc.), lapses in governance could potentially incur massive fines or even damage a company’s brand.

In order to drive effective machine learning in your organization, it’s imperative to have an effective governance strategy. It’s becoming increasingly clear that the quality of the models a company produces is not a direct indicator of how successful their ML ventures will be. The companies that struggle to meet their ML governance needs are the ones that will be left behind.

Many companies are still struggling to dig themselves out of the antipatterns described in the “Hidden Technical Debt in Machine Learning Systems” paper. Data scientists do not exist in a vacuum but rather interact with many operational and regulated systems. For companies bound to strict internal or external regulation, governance is likely the single greatest factor that will halt their ability to deliver in a timely manner.

And delivery isn’t the only concern. Since machine learning models consume and generate data, they pose a massive risk. Large data privacy fines like the $5 billion penalty imposed on Facebook in 2019 could soon have similar implications for machine learning models which touch that data.2 While larger organizations likely already have regulations and strategies in place, the difficulty of applying these solutions to new technology is dramatically impacting time to production.

Why Organizations Aren’t Seeing Value from ML

The reality is that a number of organizations don’t see value from their ML investments, and there are two likely causes:

  1. They don’t have a well-defined ML use case or data.

  2. They lack the MLOps, governance, or infrastructure to move models into a state that drives value within a reasonable timeline.

There is a stark difference between building a “successful” ML model and actually using it to drive value. For instance, let’s look at an article on Etsy that describes how to choose a metric to evaluate a model. Data scientists often focus on the offline metric, which tells us how well the model performs on past data. The online metric is often more tangible. Something like “revenue generated” is a more effective way to measure the success of an algorithm than is a purely academic metric like “model accuracy.”3

The problem is that it can be really tough to get that measurement. After you’ve trained and deployed a model, you’re nowhere near done. You need to monitor and maintain it to ensure continued benefit. Is the model generating valid predictions? Is it rational, or is it growing stale and introducing potential risk? And most importantly, is it actually moving the bottom line for your organization?

Good MLOps alleviates all of these potential burdens by providing visibility over your systems and a repeatable path by which you can improve and iterate upon your models. If value-generating ML is the destination, then MLOps is the highway. Machine learning is a continuous process. If it takes six months to deploy a model, then how long does it take to update it? By having MLOps, you provide your organization with a high-speed methodology of not only deploying new models but also updating and monitoring old ones.

Governance and security should be an inherent and essential part of this process. More than once I’ve seen an ML project get delayed by months because it failed to consider governance and security at inception. Machine learning may be new, but it’s not immune to the same governance standards for software and data within your organization. Even if the technical implementation of your production ML models is near perfect, lack of proper governance can stop a project in its tracks—while lack of adequate security can pose a massive risk.

While the basic operational principles are the same as that of software, ML is completely new in the enterprise and presents a novel set of challenges for the governance of software, data, and the ML model itself. An even bigger problem is that most companies only recently started thinking about this, and the companies that haven’t are poised to be the ones left behind as carefully designed model pipelines and strategy open the door for the first real explosion of ML-driven value in the enterprise.

What’s Needed to Derive Value from ML

Production software and data workflows have well-defined standards and requirements for everything from API definitions to source-code management (SCM). ML in production needs to meet the same standards, but there is yet to exist a one-to-one mapping of these workflows to the ML workflow. In order to deliver value with ML, these same standards need to be met and applied within the context of the ML lifecycle.

Machine learning is novel technology, but it still lives within the context of both software and data. ML needs an analogue from the well-defined software engineering frameworks that let engineering teams deliver value instead of endlessly spinning their wheels, and value happens at the business level. Most companies know (to a degree) how to ship software. Now we need to adapt that pattern to the machine learning lifecycle and next generation of software.

MLOps and governance go beyond the ML model itself to the ecosystem of machine learning technology in modern business. Delivering value with ML is about looking at the true, expanded lifecycle of algorithms in the enterprise. Successful machine learning is often dependent on a successful implementation of this lifecycle.

Like software, the ML lifecycle can be abstracted into high-level stages.4 We can consider them development, delivery, and operations. Each of these stages involves different stakeholders and critical components. However, all of these stages are subject to MLOps and ML governance (Figure 1-4).

Figure 1-4. The delivery and operation stages of the ML lifecycle are more complex than development


This stage involves a few different parties—usually the business subject matter experts (SMEs), data scientists, machine learning engineers (MLEs), and various other roles. It is very experimental and iterative. At this point in the process, data scientists require significant flexibility to run the number of experiments required to generate a successful model. Functionally, this is very similar to software development but with more focus on data and artifacts than code.


At this point, a “successful” model has been trained and it must be deployed to production. This means very different things for different ML teams. The model might be deployed for real-time inference or batch inference, built into an application, or even used in an executive report. This stage often involves an interface between data science, ML engineering, and infrastructure/IT teams.

Delivering a model should be a repeatable, secure, and automated process. In order to enable your models to generate value, you need to enable your ML teams to effectively deliver. The “bridge” between experimental data science teams and software/infrastructure teams is fundamental in creating reproducible delivery and allows everyone to focus on what they do best. This stage is characterized by technical components such as model versioning, API development, and source code management.


A common misconception is that scale = production, but this is not the case. Simply deploying a model on a large-scale framework like Kubernetes doesn’t mean that it’s in a production state. “Production” requires a reproducible pipeline, checks and balances, security, tests, and other common DevOps concepts. Operations outline the baseline characteristics of a true production system.

Operations are an infrastructure-level concern. At this point, we are integrating with multiple systems, dependent on monitoring/alerting, and there is a customer-facing component to the ML application. Moving into production also requires you to meet the security and compliance requirements of your organization and meet other business needs beyond the pure functionality of your application.

At this stage, you need to address business concerns like financial visibility (chargebacks, showbacks), regulatory concerns, user authentication, role-based access control (RBAC), and a litany of other ML-agnostic features.

Not only do these components need to be built, but they also need to be supported and maintained over time. To have successful operations, you need to have repeatable and sustainable processes that meet the internal and external requirements of your organization.

ML Governance and the ML Lifecycle

MLOps and ML governance are not “stages” in the ML lifecycle themselves but span its entirety, enabling businesses to derive value from ML. Components of a true production system such as monitoring, observability, and security wouldn’t be possible without ML governance, and its presence at every stage in the lifecycle enables teams to rapidly iterate effectively and drive value with ML.

The specific implementation of ML governance may vary based on what stage in the process it’s being applied, but MLOps and ML governance are the glue that holds the otherwise disparate lifecycle components together, resulting in a much smoother and quicker iterative process.

In order to deliver value with ML, you need an ML lifecycle. In order to have an ML lifecycle, you need to implement effective MLOps and ML governance.

A Consistent Framework for ML Governance

It’s been established that companies are struggling to set up effective governance for their models, which is inhibiting their ability to deliver ML. But what makes ML governance so difficult? It’s also been made clear that ML governance is similar to existing governance for both data and software infrastructure. If this is the case, then what makes ML governance especially challenging? The reasoning is multifaceted:

ML is new to business leaders.

Corporate executives only recently came to understand ML and started to see it not as something entirely new but as a specific implementation of technology that already existed.

Data science is an academic discipline.

Data science evolved from inherently academic practices. Data scientists were expected to enact ops and governance without the skills or knowledge to do so.

Governance is ad hoc.

Poor governance solutions often arise from unprepared data scientists. Many teams developed antipatterns and technical debt that’s difficult to dig themselves out of.

ML differs from software.

ML, for all its similarities, does differ from software in some key areas. It is experimental, with its own set of requirements, and often difficult to explain—making direct governance analogues difficult to apply.

There is no set of universal best practices for ML governance.

Due to these differences, there is no generally accepted best practice for ML governance. Most organizations are left to figure it out.

Any combination of these factors can easily stop an organization in its tracks, leaving data scientists isolated from the engineering and business-level support they need to implement a robust solution to the problem. But to implement ML governance, you first need to really understand what it is and build a culture around it.

ML governance is present across each stage of the lifecycle: development, delivery, and operations. Each is critical to successfully driving value with ML.

The Need for Governance in Development

Development is the experimental cycle in which data scientists compare different models, features, and parameters in an effort to find the most performant algorithm. In this stage of the lifecycle, ML governance is primarily about managing experiments, infrastructure, and the transition from experimentation to production. Implementing operational foundations in the development stage enables data scientists to easily evaluate experiments while providing a lightweight structure to their work. This gives data scientists the freedom they need to experiment without too many guardrails, while at the same time smoothing the transition from development to production during the delivery stage.

From a governance perspective, development is often fairly straightforward. Since nothing is actually in production or serving real users, there are significantly fewer procedural hurdles. If there are difficulties, they lie in the experimental nature of data science. It is the job of a data scientist to optimize machine learning models, not keep their code in a production-ready state.

That being said, adopting MLOps early on has the benefit of yielding solid governance. Tracking, versioning, and logging experiments increases visibility and auditability early on in the process. Even if the models themselves aren’t in production, the ML development phase still uses potentially sensitive and regulated data. This means even ML development isn’t immune from possible audits or the risk of a compliance breach.

ML development also has a huge effect on the pipeline to production. The handoff between data scientists and IT professionals is one of the more difficult parts of the ML lifecycle. MLOps in this part of the process will have significant implications on how effective that handoff is, and how quickly the model can make it to production. In addition to being the first step of the ML lifecycle, development is also the first step of an ML project. This means it has larger implications involving stakeholders and other engineering professionals required to make it successful.

Again, the governance at this experimental stage is inherently less intensive than that of the operations stage. Still, effective governance unlocks value from their workflows and feeds into a sustainable governance strategy. Implementing just the baseline of the components above can form the bedrock for the more intensive governance concerns as the model moves closer to a production state.

The Need for Governance in Delivery and Operations

The delivery and operations stages of the ML lifecycle rely heavily on effective MLOps—the set of best practices for the technical and process-oriented portions of an ML pipeline delivering ML to production. MLOps is the hard part of delivering ML to production, involving all the complexities of any production software system. Once a data scientist has identified an effective model, that model needs a robust, automated pipeline that can consistently deliver it to a scalable production state. In addition, MLOps needs to provide the automation to repeat this process as models in production get stale and new data and models become available.

ML governance is the management, control, and visibility of your MLOps. Governance works to democratize ML in your organization, driving control and visibility beyond the developer to the level of the businessperson. Without proper visibility, it’s difficult for anyone other than the data scientists themselves to understand the inner workings of their models. This has the dual detriment of making it difficult to evaluate both model success and potential risk.

“Model risk” often arises as a result of bias in the model. In the financial sector, models with poor performance have a greater risk of losing the company money.5 Models making decisions on biased data can adversely affect both individuals who rely on those models as well as the face of the company itself. When Apple’s algorithms were giving different credit ratings to people based on gender, they not only received significantly bad press but underwent an investigation.6

This risk is also prevalent in model performance. Models that undergo “drift” lose predictive power over time, and identifying this deterioration is one of the primary use cases for effective governance. Consider a credit-scoring model that helps a bank decide whether or not to approve a loan for an individual. Loss of predictive power in the model could result in the model denying viable applicants or granting loans to applicants without the ability to pay them back. Among other concerns, this represents significant potential risk for both parties. Through enhanced visibility, ML governance keeps tabs on models to identify and mitigate these problems before they have adverse effects.7

A lack of ML governance will not block you from deploying a single model at the delivery stage, but it will completely block you from quickly and effectively deploying many of them into a true production system. The greater the number of models you have, the greater the need for fine-grained controls to manage them. Your algorithms should demonstrate transparent metrics so that decision makers can use them to inform strategy. Monitoring a model for statistical, financial, and computational performance sounds like very different things, but they’re all central to a robust ML governance strategy.

Prioritizing ML visibility will enable the business to better leverage that model for risk-free decisions. If you don’t have a model catalog, your developers or users will have a difficult time discovering and leveraging your models in the first place. Even if all of the technical implementations of your deployed model are sound, it still needs to be visible and discoverable within your organization to provide value.

1 D. Sculley et al., “Hidden Technical Debt in Machine Learning Systems” (Google, 2015).

2 See “FTC Imposes $5 Billion Penalty and Sweeping New Privacy Restrictions on Facebook” (FTC, 2019),

3 See “How to Pick a Metric as the North Star for Algorithms to Optimize Business KPI: A Causal Inference Approach” (Etsy, 2020),

4 Based on Algorithmia’s MLOps management guide.

5 See the Algorithmia blog post, “What You Need to Know About Model Risk Management”.

6 See “Apple’s ‘Sexist’ Credit Card Investigated by US Regulator” (BBC News, 2019).

7 See the Algorithmia blog post, “What Is Model Governance?”.

Get The Framework for ML Governance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.