Chapter 4. Stages of the AI Life Cycle

Businesses need a systematic approach to operationalize AI, one that takes into consideration the end-to-end data science and AI life cycle. The AI life cycle consists of a sequence of stages that brings together the personas we discussed earlier into a collaborative process. These stages are generally defined as follows:

Scope

How can business stakeholders and data science teams scope the deluge of use case requests and decide on priority? What role do key performance indicators (KPIs) play in defining successful business outcomes? As we’ll discuss in the next chapter, design thinking is a best practice that should take place during the Scope stage of the AI life cycle.

Understand

How do the data provider, data governance, and data science teams work together on the datasets needed to implement a given use case?

Build

What tools and techniques are used by the data science team for data preparation, feature engineering, model training, and building?

Deploy

How can the data science and operations teams work together to deploy and track versions of models and make them available for scoring, whether online or batch?

Manage and Trust

How can the operations team ensure high performance via monitoring and retraining? How can data science, model validation/MRM, operations, and business teams work together to ensure trust and transparency by attending to bias detection, fairness, drift, explainability, and so on?

The AI life cycle stages in Figure 4-1 show where the different personas have key roles.

Figure 4-1. Stages in the AI life cycle

Let’s take a closer look at each of the five stages, as they are essential to operationalizing AI.

Scope

Data science teams are often swamped with use case requests from across the company. How can requests be scoped and prioritized to ensure that the data science team is engaged in an optimal way? More importantly, how can we ensure that the use cases selected actually produce outcomes that are meaningful to the business?

To answer these questions, and to begin the process of operationalizing AI, business stakeholders and technical stakeholders need to explore business ideas in detail, looking for clarity around the business KPIs. Without that clarity, a team is far too likely to judge the success of an AI solution by the performance of the model, rather than by its impact on the business. Consider an example: a company’s HR department wants to use AI to predict which managers will be high performers. In this case, the business metric might be something outside the feature set and model output—say, a business KPI like employee attrition rate. In this case, the metric of importance is neither a model input nor a model output. Thought needs to be given to how to correlate model performance metrics like precision and recall to this external business KPI.

Too often, a business captures its KPIs in emails, slides, or meeting notes where they’re impossible to track, especially as they change. A collaborative effort between the various personas involved at this stage, such as the product owner, data scientist, and data owner, will allow for capturing and understanding business KPIs up front (even if they change later). Doing so allows the data science team to prioritize and create swim lanes for each use case it undertakes. Later in the life cycle, these KPIs will need to be evaluated and correlated to model performance.

The Scope stage guides the prioritization of options and the gradual development and refinement of a specific plan. Usually, this process has three elements. It first focuses on the business use case, including defining the value and specifying KPIs. Then, it addresses the technical task, translating the business goal into a specific AI task to solve and characterizing the environment around that task. Finally, it develops a structured action plan for a solution to the identified technical task in support of the business goal. Adapting and applying design thinking principles to these elements is a best practice for this stage.

Understand

We all know the adages “There is no data science without data” and “Garbage in, garbage out.” It’s clear that good data is the heart of any successful AI project. But it’s still worth asking whether the data science team fully understands the datasets it’s dealing with. Do the team members understand metadata and how it maps to a business glossary? Do they know the lineage of the data, who owns the data, who has touched and transformed the data, and what data they’re actually allowed to work with?

As an example, consider a scenario where the data science team is working on a fraud detection use case. The models perform well on hold-out data but perform poorly in production. It is natural to consider whether we need to retrain the models to catch new fraud patterns. In this case, that may not solve the issue. It may turn out that the team had been working with data that has been generated using rules, not ground truth, despite assurances from the data provider. It is a failure to understand and insist on data lineage—and reinforces how important it can be to work with a data steward.

Empower the members of your data science team to “shop for data” in a central catalog using either metadata or business terms to search, as if they were shopping for items online. Once they get a set of results, give them the ability to explore and understand the data, including its owner, lineage, relationship to other datasets, and so on. Based on that exploration, they can request a data feed. Once approved (if approval is needed), the datasets can be made available to the data science team via a remote data connection in your data science development environment. As much as possible, respect data gravity and avoid moving data. If the team needs to work with personally identifiable information (PII) or protected health information (PHI), the data steward or a data provider should establish the appropriate rules and policies to govern access. For example, you can anonymize or tokenize sensitive data. This stage is really about understanding the data you need for your AI initiative within a context of data governance.

Build

Most data scientists relish the build phase of the AI life cycle where they can explore the data to understand patterns, select and engineer features, and build and train their models. This is where a myriad of tools and frameworks come together:

Open languages

Python is the most popular, with R and Scala also in the mix

Open frameworks

Scikit-learn, XGBoost, TensorFlow, PyTorch, etc.

Approaches and techniques

Classic ML techniques from regression all the way to state-of-the-art deep learning techniques like Transformers

Productivity-enhancing capabilities

Visual modeling, AutoML, AutoAI, etc. to help with feature engineering, algorithm selection, and hyperparameter optimization

Development tools

DataRobot, H2O, Watson Studio, Azure Machine Learning Studio, Amazon SageMaker, Anaconda, etc.

During this stage, the data science team selects the options that can support a smooth pipeline for data preparation, feature engineering, and model training.

The activities during the Build stage are best undertaken as a set of Agile sprints. Each sprint aims to produce a set of well-defined outcomes. The results of each sprint should be reviewed during the sprint readout. This acts as a decision point toward moving on to the next sprint or toward taking corrective action. The number and duration of sprints are highly dependent on the use case. The following is a breakdown of what the sprints could look like in a three-sprint project.

Build Sprint 1

This sprint usually focuses on exploratory data analysis. The data science team starts by connecting to relevant datasets identified in the previous stage. Team members may go to a central catalog to search for datasets, or they may use the data connections created by the data provider team. Once the data science team has access to the right datasets, it explores the data in a variety of ways. It will start by understanding the distribution of various data elements, value ranges, cardinality, and quality. The team looks for correlation between different data elements. In the case of supervised ML scenarios, the team will progress into selecting the relevant features that show predictive power. It may do feature engineering, where it constructs new features by combining or transforming existing features. By the end of Build Sprint 1, the team will have gained a solid understanding of the patterns that exist in the data.

Build Sprint 2

This sprint usually gets into building the initial versions of AI/ML models. The data science team experiments with several different algorithms and techniques at this stage, aimed at creating models with the best possible performance. Techniques like deep learning may benefit from special environments like GPUs to accelerate model training. The team will experiment with different combinations of features, algorithms, and hyperparameters with the goal of improving model performance. There are different metrics for gauging performance, highly dependent on the use case. For example, metrics like AUC, Precision, Recall, and F1 score are relevant for classification use cases. By the end of Build Sprint 2, the team will have built a baseline set of models with acceptable performance metrics.

Build Sprint 3

This sprint usually focuses on improving the performance of the initial model versions or taking corrective actions based on feedback from the previous sprint review. This may involve refining the features, further tuning of the hyperparameters, creating ensemble models, and so on. It may also involve a separate activity around checking for bias and fairness in model behavior. This can be an important requirement for those use cases that deal with sensitive data. As we discussed earlier in the report, this activity is aimed at ensuring that a privileged group is not at a systematic advantage in terms of outcomes, compared to an nonprivileged group. In addition to these steps, the team also prepares the AI/ML assets (models, notebooks, scripts, etc.) for handoff to the test and validation teams.

Deploy

Let’s say the team has built a model. In fact, suppose it has built multiple versions of the model. How does it actually deploy them? After all, there is no point in the models staying inside the data scientist’s workbench.

The data science and operations teams now need to work together closely. The combined team should be able to publish metadata around model versions into a central catalog where application developers can “shop for models,” much as the data science team shopped for data. The actual invocation of the models depends on the use case. What it looks like to access the models can vary. Possibly developers invoke a model over a REST API call from a web or mobile application or process. Possibly they use a model to score a million records in batch mode at the start of the day for a wealth management adviser. Possibly they use a model to perform an inline scoring in near real time (less than a few milliseconds) on a backend mainframe system to authorize a credit card transaction. Possibly they use a model to execute a series of classifiers in real time to detect intent (and changes in intent) as a call center agent interacts with a customer.

Regardless of the mode of invocation and latency requirements, we see a common paradigm: deploying model versions into an appropriate runtime. Most popular development environments have a runtime component that facilitates the Deploy stage, albeit with differing levels of sophistication. The data science team needs to work with business and IT stakeholders to understand which run option makes sense.

We need a structured approach to move from Build to Deploy; we call this MLOps. It brings ML and operations together, setting up a pipeline to move assets from development to preproduction and production environments. If this had just been an experimental science project, it might be fine to simply hit the “Deploy” button. Not so in an enterprise setting. The data science team needs to follow the same rigor of application development in promoting assets through each of these stages. To do so, team members need to think about how assets move from development to QA/staging to production. In many companies, moving a set of assets from development requires a sequence of steps: code review, third-party oversight, running a series of unit tests (often with different datasets than what the developer used), approval, and so on. Thinking of data science development in similar terms brings us into a continuous integration/continuous deployment (CI/CD) paradigm, supported by a CI/CD pipeline.

Most enterprises already have CI/CD mechanisms in place: GitHub Enterprise or Bitbucket as the source repository, Jenkins or Travis as the pipeline, Nexus or JFrog Artifactory for binaries, and so on. The data science pipeline needs to find ways to fit into these existing enterprise mechanisms. Ideally, when the data science team finishes the Build phase, it can tag, commit, and push all assets (Jupyter Notebooks, pipelines for data prep, the actual models, evaluation scripts, etc.) into the appropriate repositories. In particular, the commit and push step can initiate a CI/CD pipeline that follows a sequence of steps (test, review, approve, etc.) and then creates a set of deployments in the Deploy environment. We can think of this happening between development, QA/staging, and production, with slightly different steps for each transition.

Manage and Trust

Regardless of how well a model performs when the team first deploys it, the model will almost certainly degrade over time. New fraud patterns, new customer intents, and other changes in the environment that weren’t present in the training data mean that the data science team needs ways to monitor the model metrics and evaluate them against thresholds on a schedule. If the metrics breach a threshold, the system can alert the team or even initiate an automated process for retraining the model.

As an organization continues to operationalize AI, stakeholders are increasingly concerned about trust and transparency. We need guardrails across various stages of the life cycle, not just after the Deploy stage.

We need to give thought to various challenges in trusting AI, as discussed earlier. Bias and fairness monitoring and bias mitigation are crucial. The business wants to know that the model is behaving with fairness within a range specified by company policy. Can it prove that the model is being fair within a given acceptable range of fairness? Can it prove that the model is not incorrectly or inappropriately biased? The business also wants to know whether model behavior can be explained. It wants to know which change—by what margin, to what features—would have changed the model outcome. If the model recommends rejecting a loan, can the data science team play that specific transaction back and explain to a regulator why it was rejected?

Note that these requirements become especially challenging when the model doesn’t actually use the protected features. Consider the earlier HR application example. If the business decides not to use race, gender, and age as features in the model but still wants to understand how those attributes play a role, it can take the output of the model and compare against those attributes external to the model. But without the features in the model itself, it’s a challenge to evaluate model explainability, fairness, indirect bias, and so on.

Model performance metrics and trust and transparency elements should be correlated to the business KPIs that were captured in the Scope stage. This view is essential for the business to understand whether an AI project is succeeding.

Next, we’ll review how to leverage these stages in the AI life cycle, along with the people, process, and platforms involved, to establish an AI Center of Excellence.

Get Operationalizing AI now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.