Chapter 1. Deploy

The deploy stage (see Figure 1-1) represents taking code that has been developed and packaged and shipping it to a destination for user consumption. It’s the result of local development becoming shippable code. Development teams value code that’s been shipped to the dev environment—but ultimately the organizational value is still low. As code gets closer and closer to production environments—where end users can consume and interact with it—the value (and ultimately the risk) increases.

One of the most outdated aspects of the DevOps infinity loop is that release precedes deploy. This sequencing can be traced back to the idea that once a build of software code was “development complete” (i.e., all development on specific changes was done), the code would be released into the build processes. This definition not only no longer holds true but also is too restrictive for the speed and scale at which teams need to build and ship their software. It’s also worth noting that in this context, we’re using release to refer to the activity of releasing software and not to the actual version of a software release.

In modern software development, the deployment process covers the workflow of moving compiled code onto a destination infrastructure or software platform. While this is roughly reflected in the infinity loop chart, the methods of facilitating that deployment process have matured and changed.

This chapter will sharpen the deployment story and provide more clarity on the ways in which teams are now deploying software. This includes:

What deployment previously meant and what it means now
Techniques and approaches to improve deployment velocity
The roles involved in deploying code in multiple scenarios

Redefining Release and Deploy

In organizations that are embracing the practice of continuous operations, the release of software functionality to end users ultimately ends up taking place after the software deployment has occurred—when software teams want to enable feature or functionality consumption by those groups. In this example, it becomes necessary for the actual software code to be deployed onto the destination environment in order for that feature or functionality to be available for release. Deployment, whether to production or to any other environment, no longer requires or even implies a release. Deployed code is dropped into an environment, where it is evaluated and prepared for having its features released for consumption by end users.

These feature changes can be released to small groups of employees, to specific users as part of a beta program, or to all users in a general availability (GA) release. The responsibility for deciding which feature (or code) gets released to a configured group, and when, can be shared among people with different job titles across the software delivery team. We will expand on this group of decision makers in later chapters, but ultimately this becomes the foundation of continuous release—the idea that teams can constantly be releasing new features to users or groups, independent of or unconstrained by the existing deployment process.

What Deploy and Release Used to Mean

It’s worth noting that software releases used to be giant packages of many features (small and large), small bug fixes, and everything in between. The dev half of the DevOps infinity loop showed how an organization created new software and, when the software was deemed complete, it was packaged together as a release. These collections of changes were commonly created at regularly set intervals, such as annually or quarterly. For code to be accepted into a seasonal or annual release, it had to be tested in conjunction with both the existing software and the new code in that particular release. If code was not complete or didn’t pass the integration tests, it didn’t make it into that release and had to wait for the next release cycle.

Once the bundle of changes that made it into a new release was packaged up, it would then be deployed or distributed out to the necessary endpoints. Deploy in this case meant “put in the field.” This is why the deploy stage comes after the release stage in the traditional DevOps infinity loop chart—because you couldn’t deploy software that hadn’t been released yet.

Deployment and release have been historically tied together, mostly because of infrastructure considerations. Code was included in a software release, which was distributed and fully tested in lower environments such as testing or staging, which typically leveraged dedicated infrastructure. Once success was validated in these lower environments, the software release was ready to be deployed to broader user groups by moving it onto the production environment/infrastructure. At this point, the existing, live application would have to be placed into some form of maintenance mode (taken offline), typically during off-hours, to push the new release code out. Once everything was validated as working in production, the app was made available again, and the release was completed.

What Deploy and Release Mean Now

As organizations have shifted how they build and ship software, the meaning of release has changed. Continuous integration (CI) and continuous delivery (CD) give organizations the ability to continuously deploy code changes across multiple environments, without dedicated release managers and build teams handcrafting releases. Nightly builds are commonplace, with many organizations even moving past “nightly” into continuous patterns in which changes are built and deployed at the time of commit, especially in SaaS environments. With new software builds compiled and deployed at this rate of change, the idea of deploy no longer needs to be tied directly to the traditional release concept. With software now being continuously integrated (built/tested/packaged) and continuously deployed (shipped to one or many destination infrastructures or platforms) in a much-simplified process, the implied meaning of what a release is also changes.

Introducing the practice and tooling of feature management gives teams fine-grained control over releasing their code (features) that has already been deployed into an environment. Leveraging feature flags allows teams to gate their newly developed code and control whether that change is available to end users. The old definition of a software release shifts to become more aligned with the term software build, and release shifts to the actions software teams take to make functionality available to end users and systems (i.e., the feature flag/toggle experience). Releases are no longer a software artifact or a set of binary files shipped to a destination; furthermore, they can be controlled through targeting rules, as well as the practice of canary deployments (also sometimes called canary rollouts or canary releases), to allow releases to be made available progressively, on a sliding scale. The end result is that teams have far greater control over feature releases and their overall impact—an outcome that truly aligns with the overall focus of DevOps principles.

Now that we’ve outlined the new deploy and release framework, let’s explore the evolved approach to the deployment strategies themselves, such as how organizations are realizing the benefits of releasing small batches frequently.

Small and Frequent > Large and Infrequent

As teams look to transform how software is shipped, the optimal deployment pace within an environment focuses on a high frequency of small software deployments (software deployments shipped to the infrastructure the software runs on). Shipping at high velocity has numerous benefits:

Small batch sizes: Since each deployment is small, the raw speed of the deployment itself can be minimized.
Easy problem diagnosis: If something goes wrong, looking at the recent deployments and their associated code changes is a simpler task than trying to decompose larger builds with many code changes. Since deployments are small, deciphering what each deployment does and how it might have caused the problem is straightforward (especially when tied to feature flag usage, which will be covered more in later chapters).
Speed to market: Deploying frequently means you are in a constant state of forward motion with your product. You’re making frequent incremental improvements or corrections, resulting in your customers receiving a better product.
Merge conflict minimization: Large batches make merge conflicts inevitable and hard to untangle. Small and frequent code commits minimize merge conflicts since there is less time for code to become stale.
High team morale: In organizations that have invested culturally in the “small and frequent” approach, the deployment process is often optimized, allowing developers to do what they love: building and shipping code. Infrequent deployments cause developers to spend more time dealing with merge conflicts, giving them less time to work on the next deployment.

Infrequent deployments tend to be larger and more complex, so when an issue arises, figuring out which change(s) caused a problem can be challenging, leading to a more time-intensive incident process (often in the middle of an active incident). The longer an incident drags on, the more likely that tempers flare, tertiary stakeholders start demanding information, and the impact grows throughout other connected systems and even the organization as a whole.

Culturally, an infrequent deployment posture frustrates developers, ultimately reducing developer satisfaction, and can even cause employee attrition in many cases. Developers want to see progress in their work and not be blocked by endless code reviews on large code changes, merge conflict resolution, and challenging deployment processes. Organizations that deploy quickly will successfully attract and retain engineering talent. It’s a virtuous cycle. Considering the high cost of recruiting and paying developers, doing anything that hinders their output should be avoided at all costs.

Frequent deployments don’t just achieve success in an isolated section of the DevOps infinity loop. They directly influence every subsequent section of the DevOps cycle. If deployments to production are frequent, small, and reversible (via feature flags), the rest of your software development process will be set up for success. The opposite is true as well. If the deployment process is slow and batch sizes are high, releases will be infrequent, and incidents will have more severity.

Note

The key takeaway: small, frequent releases make problem diagnosis easier, increase speed to market, minimize merge conflicts, and improve the morale of your development team.

Deploy Can Be Decoupled from Release

One of the fundamental ways to speed up the deployment process is to separate it from the release process. In the old model, deploy and release were glued together. Whatever was in “the winter release” was accessible to all users at the same time.

If deploying code to production means that all users immediately see the new functionality, the stakes for every deployment activity are extremely high. These higher stakes in many cases translate to more testing, code reviews, and process burden, which will inevitably lead to a slower deployment frequency.

To achieve the benefits of continuous delivery, release has to be separated from deploy. If code can be deployed to production, and the visibility of that feature is controlled, then the stakes are lower and the requisite testing procedures can be less intensive, especially when combined with release targeting. Release targeting allows teams to explicitly target a portion of their code at a specific grouping of individuals or systems. The product team can run beta programs and canary rollouts during the release process, gradually releasing features to select groups. A streaming service can target specific device types with a new set of code.

From a development methodology perspective, feature flags are one of the best avenues to decouple the activity of deployment from release. A change can be deployed to production but have its feature flag set to “off” for some or all users, rendering the change inactive. In this example, code has been deployed to the production environment, and the releasing of that feature or capability to the user community is an independent process occurring after the deployment.

When the product team finds issues within the deployment, it also has the ability to immediately (and independently) disable the problem code by way of a kill switch—a rollback strategy that is available when you’re developing with feature flags. If deploy and release are tied together, a rollback of problematic code risks giving users whiplash when new features suddenly disappear across the entire user community. Such a poor user experience can cause disruption and frustration and ultimately lead to customer churn.

Merge Conflicts and Their Impact on Deployments

In the practice of software development, specifically when you are working in source control, code branches are frequently created for developers to build new software capabilities. The longer these branches exist apart from the main branch, the higher the risk that they become so diverged that conflicts in the codebase emerge as changes are committed and merged into the main trunk from other branches. These merge conflicts are costly to resolve from both a time and a developer productivity perspective. This time to resolve inhibits the ability of teams to ship newly built code into their environments. Developers are forced to pause development while they resolve the conflicts, distracting them from the core goal of shipping code.

At a high enough frequency, time and productivity loss can compound, causing far-reaching impacts on job satisfaction, developer retention, and operational efficiency. Conversely, an organization that is committing code often, with frequent code merges into the main trunk, experiences a greatly reduced risk of these merge conflicts due to the short-lived nature of the branches. With fewer merge conflicts to resolve, another barrier is removed from the opportunity to embrace a more continuous model, especially within the deploy step.

Decoupling deployment from release is a powerful way to improve your software delivery process. Not only does it enable you to control the impact of feature release through capabilities such as release targeting and canary releases, but it also reduces risk from failed deployments. Combining the practice of feature management with frequent code merges and deployments reduces the risk of costly merge conflicts and ultimately gets capabilities in front of users faster. These modern deployment practices are necessary to foster high-performing engineering teams and organizations.

Furthermore, they lay the groundwork for optimizing the later stages of the DevOps lifecycle: release, operations, and measurement.

Characteristics of a High-Performance Deployment System

Decoupling the concepts of deployment and release gives us an opportunity to dive into the question, What are the characteristics of a high-performance deployment system? Having deployment systems and processes in place that can sustain and thrive in a continuous model is critical to achieving a state of continuous operations. High-performance deployment systems approach the problem of deployment from a scale perspective, layering in concepts like visibility and orchestration as paths to achieving better scale and deployment velocity.

Visibility lets broader teams see, manage, and interact with system or platform changes. Deployment orchestration moves infrastructure and code changes in concert with each other, often managing adjacent infrastructure or platform dependencies, as well as the ability to deploy previous versions of the workload and upgrade to new versions.

Visibility

An ideal deployment system provides a platform through which you can see what’s going out to your workload fleet, manage change, and roll changes back or forward. Visibility provides a wide view into workload deployments, allowing fast response to performance degradation, system errors, or other problems within the environment. Often, automating the deployment process focuses on raw execution of a task and the reduction of human error; however, these tools often fall short of giving you true visibility into the platform. As teams work through operationalizing these functions, they should ensure that their tooling is providing visibility into the state of their workloads. This is not meant to refer to observability tooling explicitly, but it can be included in many cases. Visibility, in the way we discuss it here, refers to a view of the deployment system as a whole.

Visibility becomes increasingly important in environments in which workloads are being delivered automatically (whether by CI/CD or through other types of deployment automation). Quickly understanding when deployments have failed, which workloads are impacted, the rollback status, and the overall impact is critical in reducing not only negative perceptions within end-user communities but also business impacts such as lost revenue and organizational toil.

Deployment Orchestration

Deployment orchestration saves time and reduces errors by removing human interaction not only from the process of deploying software but from adjacent dependencies as well. There are often dependencies within your team processes, other application tooling, or even infrastructure and configurations that need to be included as part of your deployment. Sometimes you need to make changes to the configurations of a database, upgrade server capacity, or phase out other infrastructure components entirely. You may want to include specific technical business logic, such as the inclusion of replica deployments in the strategy or high-availability storage configurations. Orchestration tools build on the concept of automation by coordinating concepts such as these across multiple applications or platforms to help achieve a desired state.

Orchestration helps in two main areas:

Time to deploy to production: Coordinating infrastructure and code changes in tandem can be complicated and fraught with risk. Orchestration allows these processes to move in concert with each other, thereby reducing order of operations issues and accelerating speed to production. A good way to think about this is to focus less on the activity of deployment and more on the time it takes for code to become usable in a production deployment.
Repeatability: The other goal of orchestration is repeatability. You should be able to deploy what was live yesterday or last week. Sometimes deployments perform poorly, and you need to roll back to a previously known stable version. Modern tools allow teams to apply new configurations that will quickly redeploy or roll back an existing deployment and revert to a version that they know worked.

Deployment Strategies and Shipping Code

Traditionally, site reliability engineers (SREs) or system admins deployed infrastructural changes and engineers deployed compiled code. To the outside observer, these roles looked very similar: individuals with the word engineer in their title (usually) who moved software into production. But the practice of deploying infrastructure evolved separately from the practice of deploying code.

Note

We’re using developer and engineer interchangeably. We understand that some companies view engineer as a higher-responsibility role than developer. Sometimes the word engineer is what appears in a job title but developer is used more colloquially. Developers face the same challenges as engineers, so to avoid pulling a Ballmer and repeating developer over and over again, we’re using the two terms interchangeably. It’s worth noting that the roles may have different legal or licensing requirements in specific countries, so be aware.

Working in small batch sizes, collaborating with people in tangential roles, and testing outcomes have revolutionized software development. Trunk-based development and the adoption of Git have codified processes around these ideas.

Application teams have been deploying code using A/B tests and canary deployments in various ways for years. The benefits have been highly visible to others in technology organizations, and so the ideas began to spread beyond software developers.

In the software realm, these ideas aren’t new or revolutionary. What is new is the concept of leveraging these same strategies to drive similar iterations on the underlying infrastructure and supporting platforms. This sharing of ideas, philosophies, and language has enabled both developer and operational teams to work together more cohesively and move faster—and it ultimately forms the foundation of the idea of DevOps: bringing “developer” and “operations” concepts closer to each other.

Let’s dive into two infrastructural deployment strategies that have their genesis in software development practices.

Blue/Green Deployments and Canaries

Canary deployments and blue/green deployments are both adaptations of software development practices into an operational use case. They allow teams to deploy new application versions safely. Both allow automatic, low-risk deployment to production with the option to roll back easily if necessary. These two deployment strategies are often conflated, leading to the exclusive use of one or the other. But why rely on one? This section covers their use cases and differences so you can use the appropriate one in a given situation. We also cover the supporting constructs of environment copies and targeting.

Understanding blue/green deployments

Blue/green deployments (see Figure 1-2) require an exact copy of your entire stack to create two identical versions: a current or “blue” one, and a new or “green” one. Once the new feature is sufficiently tested with a small subset of the total users on the green version, you keep updating to a greater subset of users until all users are routed to the green version. If anything goes wrong, the blue version is still running, and traffic can easily be switched back to the primary running location.

Since you are providing an entire copy of your stack, you also need to ensure any necessary data (database content and supporting schemas, for example) or platforms (Kubernetes environments and so on.) exist within both environments to ensure an effective blue/green test. For this effort, you get a strategy that provides an easy fallback to your blue environment if green becomes nonfunctional.

Blue/green deployments necessitate a lot of background work to spin up identical copies of an application software stack. This is still true even in a containerized world, but with a very different set of operational burdens. Do you spin up the identical copies in the same Kubernetes cluster? You also have to manage tagging the workloads differently for the load balancer. In addition, you might have to work with network and DNS teams to manage flipping load balancer targets and weighting around.

When you are ready to make the switch, simply point the router/load balancer/DNS at the green (new) version of the application. If the deployment “event” goes well, the blue version can be spun down or can be kept available so that the router can point the traffic back to the blue version at any point, but in the meantime, traffic will be going to your new version.

Keeping the blue version online gives you more time to see how the green version performs and whether it’s achieving the goals you put forward (user adoption, performance changes, etc.), but that ultimately costs money. Maintaining identical environments and databases can be quite expensive. That cost comes in the form of increased cloud provider bills—and that’s without considering the salary and opportunity costs of engineers setting up supporting infrastructure or duplicate database schemas/configurations.

Beyond the infrastructure cost of this approach, teams also run against the operational complexity “cost” that arises from managing multiple independent deployments of an application—especially as teams expand workloads into nontraditional platforms such as Kubernetes, serverless, or other managed offerings. Beyond the running infrastructure, there are complexities that arise with adjacent technologies and the teams that manage them. For example, load balancing and DNS technologies are often managed by different teams than the ones who manage traditional infrastructure components; thus, careful coordination between these teams is needed to effectively execute a blue/green deployment when there isn’t sufficient automation in place to orchestrate these changes.

Blue/green deployments also suffer complexity in the realm of targeting rules. As mentioned already, the switch between blue and green is typically controlled via an atomic switch, and while there are options to control which version individual users receive, those are typically complex to implement and often rely on dedicated infrastructure configurations to implement successfully.

Understanding canary deployments

The term canary deployment comes from when coal miners would bring a birdcage with a canary inside deep into mines. Canaries have tiny, sensitive lungs, so if the canary dropped dead, it was an early warning to the miners that the air was unsafe.

Just as actual canaries helped coal miners test air quality, canary deployments (see Figure 1-3) allow teams to test new infrastructure and reduce the impact scope. When you flip a traffic ingress controller (like NGINX or an AWS load balancer), a canary deployment points a percentage of traffic to the new and old versions of the service at the same time using a load balancer. This enables teams to see how the new version is functioning relative to the old version and allows for a gradual rollout of the new version as confidence increases.

Canary deployments expose a subset of traffic to a new version of a service or application at the load balancer layer. Instead of copying your entire stack, you can copy only the specific components you want to run the canary against and test changes quickly and safely. Canaries are an integral part of a modern deployment cycle, but they aren’t always necessary. Evaluate whether or not a change should be canaried against the potential user or system impact. While not all changes need to be run through a canary process, there are many risk reduction benefits to running canaries, especially in higher-impact scenarios.

Leveraging a canary release strategy is often easier because there isn’t a need to duplicate wider sets of your application topology. Since you are working only with a specific component, sending traffic to a new version of that component is typically much more approachable.

From a targeting perspective, complexities exist within canary deployments as well. One challenge is the concept of stickiness. Canaries rely on bucketing users to show them one or the other version of the service or application. What happens if a user in the version 1 bucket refreshes the page and then gets assigned version 2? You want to have users “stick” to one version or the other to ensure consistency of experience. At the application level, this situation can be handled easily with a feature flag (see Chapter 2) because you know much more about the user context, such as payment history, identity, employer, geography, and so on.

Canaries are frequently executed at the infrastructure level, however, so the point of ingress (load balancer, ingress controller, or some other system allowing inbound connectivity) often doesn’t know anything about the user identity. Thus, targeting is much more rudimentary. There are several potential ways to target, such as through geolocation data, IP ranges, and other physical characteristics, but many of these require technical work to orchestrate and offer less granularity than feature flags.

Which should be used?

Blue/green and canary deployments have a lot in common. Each approach can automatically route traffic to new versions of applications, and each provides the ability to roll back quickly and easily to the old default version should issues arise.

The differences lie in what needs to be copied and how much control you have in release targeting. Blue/greens are heavier and blunter, as they require an exact copy of your entire stack and do not allow percentage rollouts. Canaries do not require wider infrastructure or database duplication and do allow granular targeting controls.

Both strategies promote safer deployments but have varying operational or financial implications.

Use a blue/green deployment if:

The duplication of infrastructure is minimal.
The operational requirement of the duplication is manageable.
You are comfortable with moving significant parts of your user community to the new version at the same time.

Use a canary deployment if:

You are deploying a specific component in your application stack.
You want the ability to gradually migrate a subset of your users for testing/validation.
You are embracing feature flags.

Deploying Code

As with the CI process of writing, building, and testing code, much can be said about code deployment. Books have been written about it, companies have been created, frameworks that address it have become commonplace, and open source ideas have seen wide adoption.

In this section, we offer a high-level overview of a modern code deployment pattern that represents a thoughtful, efficient approach.

After you have committed your code in a Git repository and all code checks have run (through the CI process), your build process system sees that and runs a build command. This compiles the code, injects it into some form of runtime, and places it in an artifact repository. This artifact is then used in the deployment, in which it is shipped into the organization’s application environment (the CD portion of the process) and can be released by others closer to the user.

In the legacy model, the end stage of this process would have constituted a release, but with our separation of deployment and release, this is no longer the case. While the new code is deployed out to the infrastructure, feature flags allow code to be selectively targeted at specific user cohorts. Code can be in the production environment while remaining inaccessible to any and all users. In this case, code that has been deployed (placed in an environment) has not necessarily been released—made visible—to users.

One way to think about code deployment is to imagine a clothing boutique that has ordered some dresses via FedEx. Once the dressmaker has made the dresses, it packages them in a box and tells FedEx the shipment is ready. A delivery person comes to the company, picks up the package, and then delivers it to the boutique. If the boutique opens the package and sees that the dresses are torn or are the wrong color, it will take issue with the dressmaker, not FedEx.

Similarly, code deployment takes place after the code has been compiled into a build but before it’s been released to the user. And while a giant box with lots of dresses, shoes, purses, and hats is more likely to have some items missing or damaged, a small deployment with only one dress reduces the chance for errors and increases the ability to diagnose the issue.

DORA Metrics (Minus MTTR)

The DevOps Research and Assessment (DORA) metrics are the best way to measure operational performance. Dr. Nicole Forsgren, Jez Humble, and Gene Kim founded DORA in 2018, and Google acquired the company in 2018. In their groundbreaking 2018 book Accelerate (IT Revolution Press), Forsgren, Humble, and Kim outlined four metrics that have become the gold standard for evaluating software delivery performance:

Change lead time: The time between when code is committed and when it is pushed to production
Deployment frequency: How often changes are pushed to production
Change failure rate: The percentage of changes that fail or cause incidents
Mean time to recovery (MTTR): The time between incident declaration and system recovery or incident resolution

These metrics provide ways to quantify, measure, and improve upon facets of building and shipping software. The DORA team isn’t the first to define standards with the hope of universal adoption. Other metrics such as lines of code or velocity of stories (small pieces of work in an Agile framework) have been floated as ways to measure software delivery.

Lines of code sounded great from the perspective of nontechnologists, but adding lines of code doesn’t necessarily equate to better code. Stripe famously revolutionized the payments industry when it offered developers a way to integrate their payment processing platform by adding only seven lines of code. Quantity of code isn’t quality of code.

Story completion velocity was another metric that gained traction. The problem was the lack of standardization of stories. One team could inflate its metrics by breaking down stories into smaller and smaller chunks. Think of this as adding items like “brush my teeth” or “check the mail” to your to-do list so you can give yourself credit for accomplishing a lot that day.

The DORA metrics don’t focus on code lines written or stories completed. They measure the health of the software delivery process overall, from when changes get made to when they fail in production. DORA metrics are becoming widely adopted, and technology practitioners owe the DORA team a debt of gratitude for standardizing the field. Sometimes new standards or metrics fail to replace previous versions and lead to even more standards proliferation.

Three of the four DORA metrics are relevant to deployment: change failure rate, deployment frequency, and change lead time. We’ll cover those here, and we’ll discuss the remaining metric (MTTR) in Chapter 3, “Operate”, where it is more relevant.

Note

One could argue that the measurement of any type of metric fits into Chapter 4, “Measure and Experiment”. Metrics are a thing you measure, right? As we discuss the transformation of deploy, it makes more sense to include the specific measurement benchmarks that we intend for teams to follow within the current chapter. In Chapter 4 we will dive deeper into the ways teams measure success as they move into a more continuous operating model.

Change lead time

Change lead time measures the time between when code is committed and when it is running in production. If lead times are long, you should examine development or deployment pipelines to identify and remove the bottlenecks.

In a perfect world, the imaginary “change lead time” stopwatch would start at the point that a team selects a feature to begin developing from something like a kanban board, where development tasks are collected, or from the initial user request for a new feature or bug fix. Unfortunately, starting the clock so far back introduces a lot of variability that makes the metric less useful. What if a feature is selected just before a major holiday, or before a high-severity incident? Time can tick by (in software development as in life!) and inflate the metric without it meaningfully measuring the work. Starting the measurement when code is committed simplifies the metric and makes it more of an apples-to-apples comparison.

Every year the DORA team surveys the software development industry and publishes its seminal State of DevOps report. In the 2022 report (the most recent one at the time of writing), the DORA team says that low-performing teams have change lead times between one and six months, medium-performing teams are between one week and one month, and high-performing teams are between one day and one week.

Testing often takes up a large chunk of the change lead time. If you want to improve your results on this metric, automate testing practices as much as possible to lower the time necessary for changes. Having engineers run automated tests when they are writing code, or instituting automatic testing procedures as part of the code check-in process, can lower this metric tremendously.

Deployment frequency

Deployment frequency refers to the volume of deployments to production over a set period of time. This is often the easiest of the DORA metrics to measure, and it is the one that is most likely to be measured currently or to have been measured previously. The goal of any company’s technology team is to deliver value to its customers as often as possible, with as little disruption as possible. Deployment frequency intuitively seems to measure that—more deployments equals more responsiveness. But this metric actually measures something else.

As touched on earlier in this chapter, deployment frequency has a direct influence on the volume of code changes within a commit or the size of the change set and on the frequency of code merges. The deployment frequency will inherently be low if teams have large batch sizes. Large batch sizes with infrequent merges run a higher risk of creating merge conflicts. If there are many changes in a batch, they all must work together, and sorting out merge conflicts takes time, reducing the frequency at which deployments can happen. Inversely, if teams have small batch sizes and deployments change individually, there is less potential for merge conflicts, removing one of the most common barriers to shipping newly developed code continuously.

The 2022 State of DevOps report defined low performers as those who deployed between once a month and once every six months, medium performers as deploying between once a week and once a month, and high performers as deploying on demand multiple times per day.

To increase deployment frequency, keep branch time short, reduce batch sizes, and merge code into main more often. The more frequently code is brought into your deployment trunk, the more frequently it can be compiled and shipped out to your user communities for consumption.

Change failure rate

Change failure rate is the percentage of changes that result in incidents or failures. The previous two metrics measure the tempo and quantity of changes pushed to production, while this one measures the quality of changes. The result will be poor if quantity and speed are high but the quality is low.

This metric defines quality as not causing an incident. Measuring changes can be squishy. Do you look at utilization? Product teams awaiting a big announcement can block some changes from release. Users will engage with some changes (“new sign-in page”) more than others (“Danish language support now available”). Since evaluating the quality of changes is so fraught, a more standard bar is necessary. Not causing an incident is fair and easy to measure, so it’s a useful stand-in for “change quality.”

The 2022 report indicated that low performers experienced a failure rate of 46%–60%, medium performers were between 16% and 30%, and high performers had a failure rate between 0% and 15%.

An important distinction to make is between change failure rate and the number of change failures. In real-world scenarios, change failures are going to happen, even with the best planning. It’s much more realistic to manage the rate at which these change failures occur to understand trending, and to work to manage severity and recoverability. Organizations that have embraced the “smaller change/deploy more often” operating posture are likely to have incidents that are easier to debug and quicker to record due to more manageable change sets.

Summary

Deploying code transforms letters, numbers, and symbols into business value. Establishing stronger, high-performing deployment systems and practices accelerates the pace of achieving that business value. These systems reduce the chance of error by providing better visibility and orchestration across your deployment environments.

Deploy used to mean something akin to “make available.” This made sense in an age when software came on CDs in boxes. With the ubiquity of the cloud, a faster, more ever-present delivery model is available. The actual mechanics of pressing CDs no longer constrains deployments.

Deployment often flies under the radar unless it breaks down. Deployment efficiency (speed of the actual deployment, for example) isn’t a user-facing concept. Users experience the downstream impacts of when software is deployed. This includes concepts such as outages, or new features being made available. Adopting a delivery posture that commits smaller code, merges that code, and deploys more frequently helps create a user environment that is moving forward continuously.

Deployment failures will happen regardless. Leveraging strategies such as blue/green deployments and canaries gives teams a greater ability to mitigate the risk of these failures impacting actual users. Moving a step further, using feature flags to break apart the deploy step from the release step gives even greater control over risk and user experience. All these concepts together enable teams to deploy software more quickly to end users.

Get Operating Continuously now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Operating Continuously by Edith Harbaugh, Cody De Arkland, Brian Rinaldi

Chapter 1. Deploy

Figure 1-1. The deploy stage

Redefining Release and Deploy

What Deploy and Release Used to Mean

What Deploy and Release Mean Now

Small and Frequent > Large and Infrequent

Note

Deploy Can Be Decoupled from Release

Merge Conflicts and Their Impact on Deployments

Characteristics of a High-Performance Deployment System

Visibility

Deployment Orchestration

Deployment Strategies and Shipping Code

Note

Blue/Green Deployments and Canaries

Understanding blue/green deployments

Figure 1-2. Blue/green deployments

Understanding canary deployments

Figure 1-3. Canary deployments

Which should be used?

Deploying Code

DORA Metrics (Minus MTTR)

Note

Change lead time

Deployment frequency

Change failure rate

Summary

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly