Radar / AI & ML

AI Product Management After Deployment

The AI product manager’s job isn’t over when the product is released. PMs need to remain engaged after deployment.

By Justin Norman and Mike Loukides

October 13, 2020

RoboCup 2016, Leipzig (source: ubahnverleih on Wikimedia Commons)

The field of AI product management continues to gain momentum. As the AI product management role advances in maturity, more and more information and advice has become available. Our previous articles in this series introduce our own take on AI product management, discuss the skills that AI product managers need, and detail how to bring an AI product to market.

One area that has received less attention is the role of an AI product manager after the product is deployed. In traditional software engineering, precedent has been established for the transition of responsibility from development teams to maintenance, user operations, and site reliability teams. New features in an existing product often follow a similar progression. For traditional software, the domain knowledge and skills required to develop new features differ from those necessary to ensure that the product works as intended. Because product development and product operations are distinct, it’s logical for different teams and processes to be responsible for them.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

In contrast, many production AI systems rely on feedback loops that require the same technical skills used during initial development. Similarly, in “Building Machine Learning Powered Applications: Going from Idea to Product,” Emmanuel Ameisen states: “Indeed, exposing a model to users in production comes with a set of challenges that mirrors the ones that come with debugging a model.”

As a result, at the stage when product managers for other types of products might shift to developing new features (or to other projects altogether), an AI product manager and the rest of the original development team should remain heavily involved. One reason for this is to tackle the (likely) lengthy backlog of ML/AI model improvements that will be discovered after the product engages with the real world. Another, of course, is to ensure that the product functions as expected and desired over time. We describe the final responsibility of the AI PM as coordinating with the engineering, infrastructure, and site reliability teams to ensure all shipped features can be supported at scale.

This article offers our perspective into the practical details of the AI PM’s responsibilities in the latter parts of the AI product cycle, as well as some insight into best practices in execution of those responsibilities.

Debugging AI Products

In Bringing an AI Product to Market, we distinguished the debugging phase of product development from pre-deployment evaluation and testing. This distinction assumes a slightly different definition of debugging than is often used in software development. We define debugging as the process of using logging and monitoring tools to detect and resolve the inevitable problems that show up in a production environment.

Emmanuel Ameisen again offers a useful framework for defining errors in AI/ML applications: “…three areas in particular are most important to verify: inputs to a pipeline, the confidence of a model and the outputs it produces.” To support verification in these areas, a product manager must first ensure that the AI system is capable of reporting back to the product team about its performance and usefulness over time. This may manifest in several ways, including the collection of explicit user feedback or comments via channels outside of the product team, and the provision of mechanisms to dispute the output of the AI system where applicable. Proper AI product monitoring is essential to this outcome.

I/O validation

From a technical perspective, it is entirely possible for ML systems to function on wildly different data. For example, you can ask an ML model to make an inference on data taken from a distribution very different from what it was trained on—but that, of course, results in unpredictable and often undesired performance. Therefore, deployed AI products should include validation steps to ensure that model inputs and outputs are within generally expected limits, before a model training or inference task is accepted as successful.

Ideally, AI PMs would steer development teams to incorporate I/O validation into the initial build of the production system, along with the instrumentation needed to monitor model accuracy and other technical performance metrics. But in practice, it is common for model I/O validation steps to be added later, when scaling an AI product. Therefore, the PM should consider the team that will reconvene whenever it is necessary to build out or modify product features that:

ensure that inputs are present and complete,
establish that inputs are from a realistic (expected) distribution of the data,
and trigger alarms, model retraining, or shutdowns (when necessary).

The composition of these teams will vary between companies and products, but a typical cross-functional team would likely include representatives from Data Science (for product-level experimentation and inference task validation), Applied Science (for model performance and evaluation), ML Engineering (for data and feature engineering, as well as model pipeline support) and Software/Feature Engineering (for integration with the full stack of the AI product—such as UI/UX, cloud services, and dev ops tools). Working together, this post-production development team should embrace continuous delivery principles, and prioritize the integration of any additional necessary instrumentation that was not already implemented during the model development process.

Finally, the AI PM must work with production engineering teams to design and implement the alerting and remediation framework. Considerations include where to set thresholds for each persona, alert frequency, and the degree of remediation automation (both what’s possible and desired).

Inference Task Speed and SLOs

During testing and evaluation, application performance is important, but not critical to success. In the production environment, when the outputs of an ML model are often a central (yet hidden) component of a greater application, speed and reliability are critically important. It is entirely possible for an AI product’s output to be absolutely correct from the perspective of accuracy and data quality, but too slow to be even remotely useful. Consider the case of autonomous vehicles: if the outputs from even one of the many critical ML models that comprise the vehicle’s AI-powered “vision” are delivered after a crash, who cares if they were correct?

In engineering for production, AI PMs must take into account the speed at which information from ML/AI models must be delivered (to validation tasks, to other systems in the product, and to users). Technologies and techniques—such as engineering specifically for GPU/TPU performance and caching—are important tools in the deployment process, but they are also additional components that can fail, and thus be responsible for the failure of an AI product’s core functionality. An AI PM’s responsibility is to ensure that the development team implements proper checks prior to release, and—in the case of failure—to support the incident response teams, until they are proficient in resolving issues independently.

AI product managers must also consider availability: the degree to which the service that an AI product provides is available to other systems and users. Service Level Objectives (SLOs) provide a useful framework for encapsulating this kind of decision. In an incident management blog post, Atlassian defines SLOs as: “the individual promises you’re making to that customer… SLOs are what set customer expectations and tell IT and DevOps teams what goals they need to hit and measure themselves against. SLOs can be useful for both paid and unpaid accounts, as well as internal and external customers.”

Service Level Indicators, Objectives, and Agreements (SLIs, SLOs, and SLAs) are well-known, frequently used, and well-documented tools for defining the availability of digital services. For cloud infrastructure some of the most common SLO types are concerned with availability, reliability and scalability. For AI products, these same concepts must be expanded to cover not just infrastructure, but also data and the system’s overall performance at a given task. While useful, these constructs are not beyond criticism. Chief among the challenges are: choosing the correct metrics to begin with, measuring and reporting once metrics are selected, and the lack of incentive for a service provider to update the service’s capabilities (which leads to outdated expectations). Despite these concerns, service level frameworks can be quite useful, and should be in the AI PM’s toolkit when designing the kind of experience that an AI product should provide.

Durability

You must also take durability into account when building a post-production product plan. Even if well-designed, multi-layer fault detection and model retraining systems are carefully planned and implemented, every AI-powered system must be robust to the ever-changing and naturally stochastic environment that we (humans) all live in. Product managers should assume that any probabilistic component of an AI product will break at some point. A good AI product will be able to self-detect and alert experts upon such a failure; a great AI product will be able to detect the most common problems and adjust itself automatically—without significant interruption of services for users, or high-touch intervention by human experts.

There are many ways to improve AI product durability, including:

Time-based model retraining: retraining all core models periodically, regardless of performance.
Continuous retraining: a data-driven approach that employs constant monitoring of the model’s key performance indicators and data quality thresholds.

It’s worth noting that model durability and retraining can raise legal and policy issues. For example, in many regulated industries, changing any core functionality of an AI system’s decision-making capability (i.e., objective functions, major changes to hyperparameters, etc.) require not only disclosure, but also monitored testing. As such, an AI Product Manager’s responsibility here extends to releasing not only a usable product, but one that can be ethically and legally consumed. It’s also important to remember that no matter what the approach to developing and maintaining a highly durable AI system, the product team must have access to high quality, relevant metrics on both model performance and functionality.

Monitoring

Proper monitoring (and the software instrumentation necessary to perform it) is essential to the success of an AI product. However, monitoring is a loaded term. The reasons for monitoring AI systems are often conflated, as are the different types of monitoring and alerting provided by off-the-shelf tools. Emmanuel Ameisen once again provides a useful and concise definition of model monitoring as a way to “track the health of a system. For models, this means monitoring their performance and the equity of their predictions.”

The simplest case of model monitoring is to compute key performance metrics (related to both model fit and inference accuracy) regularly. These metrics can be combined with human-determined thresholds and automated alerting systems to inform when a model has “drifted” beyond normal operating parameters. While ML monitoring is a relatively new product area, standalone commercial products (including Fiddler and superwise.ai) are available, and monitoring tools are incorporated into all the major machine learning platforms.

Separate from monitoring for model freshness, Ameisen also mentions the need to apply technical domain experience in designing monitoring systems that detect fraud, abuse, and attack from external actors. AI PMs should consult with Trust & Safety and Security teams to combine the best principles and technical solutions with existing AI product functionality. In some specific domains—such as financial services or medicine—no easy technical solutions exist. In this case, it is the responsibility of the AI product team to build tools to detect and mitigate fraud and abuse in the system.

As we’ve mentioned previously, it’s not enough to simply monitor an AI system’s performance characteristics. It is even more important to consistently ensure that the AI product’s user-facing and business purposes are being fulfilled. This responsibility is shared by the development team with Design, UX Research, SRE, Legal, PR, and Customer Support teams. The AI PM’s responsibility is again to orchestrate reasonable and easily repeatable mitigations to any problems. It is crucial to design and implement specific alerting capabilities for these functions and teams. If you simply wait for complaints, they will arise far too late in the cycle for your team to react properly.

No matter how well you research, design, and test an AI system, once it is released, people are going to complain about it. Some of those complaints will likely have merit, and responsible stewardship of AI products requires that users are given the ability to disagree with the system’s outputs and escalate issues to the product team.

It is also entirely possible for this feedback to show you that the system is underserving a particular segment of the population, and that you may need a portfolio of models to serve more of the user base. As an AI PM, you have the responsibility to build a safe product for everyone in the population who might use it. This includes consideration of the complexities that come into play with intersectionality. For example, an AI product might produce great outcomes for wealthy, American, cisgender, heterosexual, White women—and although it might be tempting to assume those outcomes would apply to all women, such an assumption would be incorrect. Returning to previous anti-bias and AI transparency tools such as Model Cards for Model Reporting (Timnit Gebru, et al.) is a great option at this point. It is important not to pass this development task off to researchers or engineers alone; it is an integral part of the AI product cycle.

If done right, users will never be aware of all the product monitoring and alerting that is in place, but don’t let that trick you. It’s essential to success.

Post-Deployment Frameworks

One question that an AI PM might ask when pondering these post-production requirements is: “This seems hard; can’t I just buy these capabilities from someone else?” This is a fair question, but—as with all things related to machine learning and artificial intelligence—the answer is far from a binary yes or no.

There are many tools available to help with this process, from traditional vendors and bleeding edge startups alike. Deciding what investment to make in MLOps tooling is an inherently complex task. However, careful consideration and proactive actions often lead to defendable competitive advantages over time. Uber (the developer of Michelangelo), Airbnb (developer of zipline), and Google have all taken advantage of superior tooling and operations skills to build market-leading AI products.

Nearly every ML/AI library touts full end-to-end capabilities, from enterprise-ready stacks (such as H20.ai, MLFlow, and Kubeflow) to the highly specialized and engineer-friendly (such as Seldon.io) and everything in-between (like Dask). Enterprise level-frameworks often provide deep and well-supported integration with many common production systems; smaller companies might find this integration unnecessary or overly cumbersome. Regardless, it’s a safe bet that getting these off-the-shelf tools to work with your AI product in the exact ways you need them to will be costly (if not financially, then at least in time and human labor). That said—from a scale, security and features perspective—such capabilities may be required in many mature AI product environments.

On the other hand, building and scaling a software tool stack from scratch requires a significant sustained investment in both developer time and technology. Facebook, Uber, AirBnB, Google, Netflix, and other behemoths have all spent millions of dollars to build their ML development platforms; they also employ dozens to hundreds of employees, each tasked with building and scaling their internal capabilities. The upside here is that such end-to-end development to deployment frameworks and tools eventually become a competitive advantage, in and of themselves. However, it’s worth noting that in such environments, employing a single AI PM is not feasible. Instead, a cadre of PMs focused on different components of the AI product value chain are needed.

Where do we go from here?

Building great AI products is a significant, cross-disciplinary, and time-consuming undertaking, even for the most mature and well-resourced companies. However, what ML and AI can accomplish at scale can be well worth the investment. Although a return on investment is never guaranteed, our goal is to provide AI PMs with the tools and techniques needed to build highly engaging and impactful AI products in a wide variety of contexts.

In this article, we focused on the importance of collaboration between product and engineering teams, to ensure that your product not only functions as intended, but is also robust to both the degradation of its effectiveness and the uncertainties of its operating environment. In the world of machine learning and artificial intelligence, a product release is just the beginning. Product managers have a unique place in the development ecosystem of ML/AI products, because they cannot simply guide the product to release and then turn it over to IT, SRE, or other post-production teams. AI product managers have a responsibility to oversee not only the design and build of the system’s capabilities, but also to coordinate the team during incidents, until the development team has completed enough knowledge transfer for independent post-production operation.

The evolution of AI-enabled product experiences is accelerating at breakneck speed. In parallel, the emerging role of AI product management continues to evolve at a similar pace, to ensure that the tools and products delivered to the market provide true utility and value to both customers and businesses. Our goal in this four-part series on AI product management is to increase community awareness and empower individuals and teams to improve their skill sets in order to effectively steer AI product development toward successful outcomes. The best ML/AI products that exist today were brought to market by teams of PhD ML/AI scientists and developers who worked in tandem with resourceful and skilled product teams. All were essential to their success.

As the field of AI continues to mature, so will the exciting field of AI product management. We can’t wait to see what you build!

Sources:

MLOps: Continuous delivery and automation pipelines in machine learning (Google)
MLOps: What you need to know (Forbes)
SLA vs. SLO vs. SLI: What’s the difference? (Atlassian)
MLOps Tooling (Todd Morrill)
Building Machine Learning Powered Applications (O’Reilly)
Designing Data-Intensive Applications (O’Reilly)

Thanks

We would like to thank the many people who have contributed their expertise to the early drafts of the articles in this series, including: Emmanuel Ameisen, Chris Albon, Chris Butler, Ashton Chevalier, Hilary Mason, Monica Rogati, Danielle Thorp, and Matthew Wise.

Post topics: AI & ML

Post tags: Deep Dive