What you need to know about product management for AI

A product manager for AI does everything a traditional PM does, and much more.

By Peter Skomoroch and Mike Loukides
March 31, 2020
AI PM post

If you’re already a software product manager (PM), you have a head start on becoming a PM for artificial intelligence (AI) or machine learning (ML). You already know the game and how it is played: you’re the coordinator who ties everything together, from the developers and designers to the executives. You’re responsible for the design, the product-market fit, and ultimately for getting the product out the door. But there’s a host of new challenges when it comes to managing AI projects: more unknowns, non-deterministic outcomes, new infrastructures, new processes and new tools. A lot to learn, but worthwhile to access the unique and special value AI can create in the product space.

Whether you manage customer-facing AI products, or internal AI tools, you will need to ensure your projects are in sync with your business. This means that the AI products you build align with your existing business plans and strategies (or that your products are driving change in those plans and strategies), that they are delivering value to the business, and that they are delivered on time. A PM for AI needs to do everything a traditional PM does, but they also need an operational understanding of machine learning software development along with a realistic view of its capabilities and limitations.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Why AI software development is different

AI products are automated systems that collect and learn from data to make user-facing decisions. Pragmatically, machine learning is the part of AI that “works”: algorithms and techniques that you can implement now in real products. We won’t go into the mathematics or engineering of modern machine learning here. All you need to know for now is that machine learning uses statistical techniques to give computer systems the ability to “learn” by being trained on existing data. After training, the system can make predictions (or deliver other results) based on data it hasn’t seen before.

AI systems differ from traditional software in many ways, but the biggest difference is that machine learning shifts engineering from a deterministic process to a probabilistic one. Instead of writing code with hard-coded algorithms and rules that always behave in a predictable manner, ML engineers collect a large number of examples of input and output pairs and use them as training data for their models.

For example, if engineers are training a neural network, then this data teaches the network to approximate a function that behaves similarly to the pairs they pass through it. In the best case scenario, the trained neural network accurately represents the underlying phenomenon of interest and produces the correct output even when presented with new input data the model didn’t see during training. For machine learning systems used in consumer internet companies, models are often continuously retrained many times a day using billions of entirely new input-output pairs.

Machine learning adds uncertainty

With machine learning, we often get a system that is statistically more accurate than simpler techniques, but with the tradeoff that some small percentage of model predictions will always be incorrect, sometimes in ways that are hard to understand.

This shift requires a fundamental change in your software engineering practice. The same neural network code trained with seemingly similar datasets of input and output pairs can give entirely different results. The model outputs produced by the same code will vary with changes to things like the size of the training data (number of labeled examples), network training parameters, and training run time. This has serious implications for software testing, versioning, deployment, and other core development processes.

For any given input, the same program won’t necessarily produce the same output; the output depends entirely on how the model was trained. Make changes to the training data, repeat the training process with the same code, and you’ll get different output predictions from your model. Maybe the differences will be subtle, maybe they’ll be substantial, but they’ll be different.

The model is produced by code, but it isn’t code; it’s an artifact of the code and the training data. That data is never as stable as we’d like to think. As your user base grows, the demographics and behavior of the user population in production shift away from your initial training data, which was based on early adopters. Models also become stale and outdated over time. To make things even more challenging, the real world adapts to your model’s predictions and decisions. A model for detecting fraud will make some kinds of fraud harder to commit–and bad actors will react by inventing new kinds of fraud, invalidating the original model. Models within AI products change the same world they try to predict.

Underneath this uncertainty lies further uncertainty in the development process itself. It’s hard to predict how long an AI project will take. Predicting development time is hard enough for traditional software, but at least we can make some general guesses based on past experience. We know what “progress” means. With AI, you often don’t know what’s going to happen until you try it. It isn’t uncommon to spend weeks or even months before you find something that works and improves model accuracy from 70% to 74%. It’s hard to tell whether the biggest model improvement will come from better neural network design, input features, or training data. You often can’t tell a manager that the model will be finished next week or next month; your next try may be the one that works, or you may be frustrated for weeks. You frequently don’t know whether something is feasible until you do the experiment.

AI product estimation strategies

Planning and estimation are difficult for AI products because it is rare to find two real-world systems where the training data and algorithms applied are the same.

Imagine you are a data scientist at Disney. Your division is starting a new video streaming service and you’re tasked with building a system to recommend movies. You might establish a baseline by replicating collaborative filtering models published by teams that built recommenders for MovieLens, Netflix, and Amazon. There may even be someone on your team who built a personalized video recommender before and can help scope and estimate the project requirements using that past experience as a point of reference.

In this scenario, your Disney team appears to be solving a problem similar to the early Netflix Prize recommendation problem. You have a highly curated catalog with a small number of professionally produced movies and TV series, and need to recommend those items to users based on their interests and viewing habits. Your team also needs to solve a cold start problem so you can recommend movies before the system begins collecting user feedback data (typically solved by using contextual topic-based or popularity-based recommendations), but once you gather explicit user ratings and video viewing data, you should be able to build a reasonable system. It may even be faster to launch this new recommender system, because the Disney data team has access to published research describing what worked for other teams.

But this is a best-case scenario, and it’s not typical. What if instead of a narrow, curated video catalog, you were building a recommender system for a consumer video app, where anyone could create and upload user-generated content (UGC)? You might have millions of short videos, with user ratings and limited metadata about the creators or content. Social and trending signals in this network will be important, and controlling spam and abuse will be a challenge. It may even be necessary to do image or video analysis to make content-based recommendations, detect fraud, or reject content that violates your rules (for example, live shooter videos). You could still begin by shipping a simple cold-start recommender system, but it will take you much longer to build and iterate on your model to achieve the level of accuracy the business expects. You will likely encounter many challenges training your recommender with large amounts of constantly changing UGC and conflicting objectives.

These issues may be unexpected for teams that aren’t familiar with developing machine learning systems trained on user-generated content. If you ignore these complications during planning and assume your system will behave similarly to the original recommenders at Netflix, the project will end up significantly behind schedule, and may have serious abuse problems that Netflix didn’t face. In each of these examples, the machine learning problem faced by the business was similar (recommend movies to users), but the required approach ended up being very different based on subtle differences in the data and product design.

Predicting development time becomes even more difficult when you apply an algorithm successfully used in one domain to a different problem. Consider using the Netflix collaborative filtering algorithm to recommend jobs to job seekers. On the surface, these problems seem similar: we have a dataset of items (jobs) and users (job seekers), so, in theory, we could use a job seeker’s history of saved jobs or job applications to recommend similar new jobs. Complications arise when you consider the nuances of recruiting data and job applications. Features like geography and job seniority are critical to getting a good match. Job postings have a much shorter relevant lifetime than movies, so content-based features and metadata about the company, skills, and education requirements will be more important in this case. Job recommendations also include additional algorithmic and regulatory challenges related to diversity, bias, and fairness that are not encountered in movie recommendations.

The point isn’t that estimating AI projects is intractably hard; it’s that you aren’t likely to succeed if you expect an AI project to behave like a traditional software project. There are strategies for dealing with all of this uncertainty–starting with the proverb from the early days of Agile: “do the simplest thing that could possibly work.” You don’t always need to start with a complex neural network; a simple regression (or even simpler, an average) might be enough to get your project off the ground. In some cases, that simple model may be all you ever need. The biggest problems arise from taking shortcuts and assuming that a machine learning model that works for one application will perform well in a different context without looking at the underlying data.

Organizational prerequisites for AI at scale

Particularly at a company that’s new to AI, part of an AI product manager’s job is helping the organization build the culture it needs to succeed with AI. Because it’s so different from traditional software development, where the risks are more or less well-known and predictable, AI rewards people and companies that are willing to take intelligent risks, and that have (or can develop) an experimental culture. As Jeff Bezos has said, “If you only do things where you know the answer in advance, your company goes away.”

No company wants to dry up and go away; and at least if you follow the media buzz, machine learning gives companies real competitive advantages in prediction, planning, sales, and almost every aspect of their business. If machine learning is so amazing, why hasn’t every company applied it and reinvented itself?

Even simple machine learning projects can be difficult, and managing these projects in a real business is much harder than most people realize; that’s why VentureBeat claims 87% of machine learning products never make it into production, and Harvard Business Review says that “The first wave of corporate AI is bound to fail.” Machine learning is not fairy dust you can sprinkle on your existing product. You can’t just plug in off-the-shelf cloud APIs that will magically make your product intelligent. Machine learning requires a complete rethinking; your products and your workflows are likely to change in fundamental ways. Product managers for AI need to lead that rethinking.

VentureBeat discusses two reasons for failure: management that believes you can solve problems by throwing money at them (whether that means hiring more, or better, developers), and data that is locked away into silos, where the people building your ML applications can’t get it. These are fundamentally cultural problems. You need to understand that many solutions can’t be bought (yet), that AI products require collaboration between teams, that data silos stand in the way of success, and that the best remedy for failure is picking yourself up and trying again. (To be clear, we are not saying that data can or should be used indiscriminately, without concern for legal compliance, customer privacy, bias, and other ethical issues.)

The need for an experimental culture implies that machine learning is currently better suited to the consumer space than it is to enterprise companies. For enterprise products, requirements often come from a small number of vocal customers with large accounts. It’s difficult to be experimental when your business is built on long-term relationships with customers who often dictate what they want. Measurement, tracking, and logging is less of a priority in enterprise software. An enterprise company like Oracle has a lot of customers, but Oracle’s customer base is dwarfed by Amazon’s or Walmart’s. Consumer product management is typically more bottom-up, driven by large volumes of user feedback and usage tracking data. Many consumer internet companies invest heavily in analytics infrastructure, instrumenting their online product experience to measure and improve user retention. It turns out that type of data infrastructure is also the foundation needed for building AI products.

The ability to make decisions based on data analytics is a prerequisite for an “experimental culture.” This was the path taken by companies like Google, Facebook, and LinkedIn, which were driven by analytics from the beginning. At measurement-obsessed companies, every part of their product experience is quantified and adjusted to optimize user experience.

These companies eventually moved beyond using data to inform product design decisions. They have deployed machine learning at scale to recommend movies and friends, personalize ads, and deliver search results. Their user agreements allow them to use data to improve their products. They’ve built the infrastructure needed to collect, manage, and analyze their data, and deploy AI products that can automatically make user-facing decisions in real time. By putting these pieces together, these companies created an environment where machine learning discoveries and innovation in AI are an integral property of their culture.

You are unlikely to succeed at AI if you haven’t laid a proper foundation for it. That foundation means that you have already shifted the culture and data infrastructure of your company. In “The AI Hierarchy of Needs,” Monica Rogati argues that you can build an AI capability only after you’ve built a solid data infrastructure, including data collection, data storage, data pipelines, data preparation, and traditional analytics. If you can’t walk, you’re unlikely to run. Just as AI product managers need to help build a culture in which they can succeed, they need to help define and build the infrastructure that will allow an organization to walk, and then to run.

If you’re just learning to walk, there are ways to speed up your progress. Although machine learning projects differ in subtle ways from traditional projects, they tend to require similar infrastructure, similar data collection processes, and similar developer habits. A relatively narrow project, like an intelligent search interface for your product, will require you to develop a lot of the basics, starting with the ability to acquire, clean, store, and analyze data. You’ll become familiar with the problems that real-world data presents. You’ll have to build the infrastructure that data projects require. Most important, you’ll start building relationships with other teams–and those relationships will become crucial when you tackle bigger projects.

The prospect of taking on a costly data infrastructure project is daunting. If your company is starting out on this path, it’s important to recognize that there are now widely available open source tools and commercial platforms that can power this foundation for you. According to Lukas Biewald, founder of Figure Eight and Weights & Biases: “Big companies should avoid building their own machine learning infrastructure. Almost every tech company I talk to is building their own custom machine learning stack and has a team that’s way too excited about doing this.”

If you are still figuring out your analytics strategy, you are fighting the last war. That doesn’t mean you shouldn’t be thinking about AI, but it’s a goal, not the next step. Start with a simple project, build your infrastructure, learn how to use your data effectively, build relationships within the organization, then make the leap.

Identifying “viable” machine learning problems

Any product manager is part of the team that determines what product to build. If you are just starting out with AI, that decision is especially important–and difficult. The stakes are high–and you can be pardoned if you’re uncomfortable with ideas that are expensive and have an uncertain probability of success. Product managers are more comfortable with roadmaps that can get to market value in the next 12 months, and costs that can be kept to a minimum. AI doesn’t fit that model. An AI pilot project, even one that sounds simple, probably won’t be something you can demo quickly. You will struggle to make the case to invest in research upfront.

Therefore, you need to pay particular attention to defining a “minimum viable product” (MVP). How do you find an MVP, with the stress on both “minimum” and “viable”? What features should be deferred to later versions, and what belongs in the initial release? A demo, or even a first release, can be based on heuristics or simple models (linear regression, or even averages). Having something you can demo takes some of the pressure off your machine learning team. But you still need to answer the question: how do you tell the difference between technology you can productize now, and that which will be viable in an uncertain time frame? Most interesting things in AI are on the cutting edge of what we can do in engineering, and that makes them unpredictable: you don’t know when the engineering team will have the insight needed to make the product work. Those cutting-edge ideas are also attractive, both to managers who don’t understand the risks and to developers who want to try something that’s really challenging. And you, as the product manager, are caught between them.

Effective product managers for AI know the difference between easy, hard, and impossible problems. A good example of a problem that has been hard or impossible until recently is generative text summarization. It seems like it should be within reach of our current machine learning algorithms, but in practice, accurately summarizing arbitrary text is still beyond the state of the art. You can generate text that, at first glance, appears to be written by a human, but upon closer inspection, you will often find it filled with factual and grammatical errors unacceptable in most business applications. This the “art of the possible,” an intuition for what is and isn’t feasible. It’s an intuition that you can learn through experience–and it’s why understanding your failures is at least as important as understanding your successes.

For AI products, one important part of being “feasible” is being precisely defined. As Jeremy Jordan says, “A problem well-defined is half solved.” It’s easy to look at the many successes of AI over the past few years and think that there’s some magic, but there really isn’t. If you can state what you want to accomplish very precisely, and break that down into even simpler problems, you’re off to a good start. Jordan has some good advice: start by solving the problem yourself, by hand. If you want to help customers organize pictures on their phones, spend some time on your phone, organizing pictures. Interview actual customers to see what they want. Build a prototype they can try with real data. Above all, don’t think that “we want to help customers organize pictures” is a sufficient problem statement. It isn’t; you’ve got to go into much more detail about who your customers are, how they want to organize their pictures, what kinds of pictures they’re likely to have, how they want to search, and more.

Another good proxy for identifying “viable” machine learning problems is to see how quickly you can construct a labeled benchmark dataset along with clear, narrowly defined accuracy goals for your ML algorithm. Data labeling ease is a good proxy for whether machine learning is cost effective. If you can build data labeling into normal user activities within your product (for example, flagging spam emails), then you have a shot at gathering enough input-output pairs to train your model. Otherwise, you will burn money paying external services for labeled data, and that up-front cost–before you can do your first demo–can easily be the most expensive part of the project. Without large amounts of good raw and labeled training data, solving most AI problems is not possible.

Even with good training data and a clear objective metric, it can be difficult to reach accuracy levels sufficient to satisfy end users or upper management. When you’re planning a product, it’s important to have a gut feel for what error rates are achievable and what aren’t, and what error rates are acceptable for your application. Product recommendations are easy; nobody is injured if you recommend products that your customers don’t want, though you won’t see much ROI. Fraud detection is riskier; you’re working with real money, and errors show up in your bottom line. Autonomous vehicles are a different matter; if you’re building an autonomous vehicle, you need AI that is close to perfect. (And perfect will never be achievable.) That kind of difference has a tremendous effect on how you structure the development process.

Work on things that matter to your business

The most important advice we can give is to make sure you work on AI products that matter to the business. It’s entirely too easy to define a problem, spend three to six months solving it, and then find out the solution works, but nobody cares; it doesn’t make a difference to the business. One of a product manager’s most important jobs is ensuring that the team is solving a problem that’s worth solving.

If you have a good data team and an intuitive understanding of your company’s data, there should be no shortage of ideas around how to improve your product. You will probably have more ideas than you can possibly use–so how do you prioritize the list of machine learning projects? How do you select what to work on? What delivers the greatest ROI? Shipping any machine learning system requires a huge mountain of organizational and data engineering effort, so the ultimate payoff needs to match that investment.

The buzz around AI has encouraged many people to think that AI can suddenly double or triple your profitability. That’s unlikely to be true–but what is likely? A product manager needs to be realistic about expectations. You shouldn’t over-promise, and you shouldn’t under-deliver. But neither should you under-promise: while simple products might help you to get started, you want to show upper management you can move the needle significantly. If the needle doesn’t move, you will undermine your team. If a product is feasible, if it’s something customers want, if you can get realistic error rates, and if you understand the development flows, you still have to ask whether it’s the best investment of time and resources. Is there another product that will generate a greater return more quickly?

To make these judgements, an AI product manager needs to understand the company’s data inside and out. That includes the ability to do your own analysis, to run SQL queries, to develop metrics, and to build dashboards. If you don’t understand your data intimately, you will have trouble knowing what’s feasible and what isn’t. You will have trouble understanding problems with data quality–you should know in your bones why 80% of a data scientist’s time is spent cleaning data. Without this data familiarity, you will have trouble spotting ethical problems that arise from biased or insufficient data. If you can’t define the right metrics to monitor, you won’t know whether or not your product is successful, nor will you know when your model performance has degraded (as it almost inevitably will).

Even if a product is feasible, that’s not the same as product-market fit. Is the product something that customers need? Will it help a small segment of customers or will it increase the most important metric for the majority of your users? Too many companies focus on building something cool without thinking about whether anyone really cares. Customers want you to solve their problems; they don’t care what kind of neural network you’re using. You may discover that you don’t need AI at all, and that’s just fine.

Prioritizing with the business in mind

There are a number of different ways to prioritize features into a product roadmap, and it’s likely your product organization already has its own preferred methodology for this. That said, there are many new machine learning teams working on a large number of projects without a clear prioritization or roadmap. Many companies invest a lot in hiring data scientists and building ML platforms, but then they focus them on solving the wrong problems.

One successful approach to this issue is to organize ML product feature ideas by theme and concentrate on a few high ROI projects. To prioritize, start with your company’s mission and near-term strategic objectives. What is the business trying to achieve? Pair a machine learning application directly to one of those objectives, so that when you improve the accuracy metric for your model it directly impacts metrics the business cares about. Build a direct connection between your machine learning application and something the company values.

For example, at LinkedIn (where co-author Pete Skomoroch previously worked) the mission was to connect the world’s professionals to make them more productive and successful. A strategic objective for the company was to become the professional profile of record and have complete and up-to-date resume data in the LinkedIn profiles for all professionals. A project idea under this objective was to create a machine learning model to recommend skills a member should add to their profile. A team came up with an impact estimate for the product feature by estimating the expected increase in conversion rate when users were shown ML recommendations.

People You May Know (PYMK) was a successful example of this type of strategic alignment from LinkedIn’s data team. The PYMK recommendation system was trained on data including existing LinkedIn connections, profile similarity, and contacts imported from email to suggest other members a user should connect with. PYMK directly paired what the company wanted to do (drive connections) with a machine learning solution. With a small number of engineers, the data team built a production machine learning model that directly improved the most important metric for the company. Within months it also drove new user growth for the site and created a flywheel of user growth that was critical as LinkedIn became a public company.

Once you prune down the set of ideas to ones that align with strategic objectives, there are a number of ways to prioritize them. One effective approach is to get everyone in a room who will be building the system, and have the group form consensus estimates of difficulty, headcount, and impact for each project. Then you can create a chart of impact and ease, rank each project by return on investment and prioritize accordingly. In reality, prioritization is a messy and fluid process, as projects often have dependencies and face staffing limitations or conflicts with other stakeholder deadlines. Scope often needs to be reduced or quality sacrificed to align with other teams or priorities.

Working on something that matters to the business is not the only important criteria to consider, since without access to data, your ML system will be useless. In larger companies, it’s best to start by focusing on business units that are eager to work with you and where your help is needed. When you begin development of your first ML product, try to work with teams that already have training data available and help them drive their most important metric. Ideally, that also aligns with the larger set of company priorities.


Where do you go from here as a product manager new to the world of AI? This role is still being defined, but there are already many useful resources out there for you. Here are some great places to start:

AI has tremendous potential for those who are willing to learn and to think differently. We hear a lot about AI and corporate transformation; but what we need to make this transformation are people who are willing to lead the changes in corporate culture, help build the data infrastructure, and explore problems that will deliver a measurable return with reasonable investment.

This article is part of a series. The next article can be found here.

Post topics: AI & ML
Post tags: Deep Dive

Get the O’Reilly Radar Trends to Watch newsletter