Continuous Deployment Is for Mission-Critical Applications

Having evangelized the concept of continuous deployment for the past few years, I've come into contact with almost every conceivable question, objection, or concern that people have about it. The most common reaction I get is something like "That sounds great—for your business—but that could never work for my application." Or, phrased more hopefully, "I see how you can use continuous deployment to run an online consumer service, but how can it be used for B2B software?" Or variations thereof.

I understand why people would think that a consumer Internet service such as IMVU isn't really mission critical. I would posit that those same people have never been on the receiving end of a phone call from a 16-year-old girl complaining that your new release ruined her birthday party. That's where I learned a whole new appreciation for the idea that mission critical is in the eye of the beholder. But even so, there are key concerns that lead people to conclude that continuous deployment can't be used in mission-critical situations.

Implicit in these concerns are two beliefs:

  • Mission-critical customers won't accept new releases on a continuous basis.

  • Continuous deployment leads to lower-quality software than software built in large batches.

These beliefs are rooted in fears that make sense. But as is often the case, the right thing to do is to address the underlying cause of the fear (http://www.startuplessonslearned.com/2009/05/fear-is-mind-killer.html) instead of avoiding improving the process. Let's take each in turn.

Another Release? Do I Have To?

Most customers of most products hate new releases. That's a perfectly reasonable reaction, given that most releases of most products are bad news. It's likely that the new release will contain new bugs. Even worse, the sad state of product development generally means the new "features" are as likely to be ones that make the product worse, not better. So, asking customers if they'd like to receive new releases more often usually leads to a consistent answer: "No, thank you." On the other hand, you'll get a very different reaction if you say to customers, "The next time you report an urgent bug, would you prefer to have it fixed immediately or wait for a future arbitrary release milestone?"

Most enterprise customers of mission-critical software mitigate these problems by insisting on releases on a regular, slow schedule. This gives them plenty of time to do stress testing, training, and their own internal deployment. Smaller customers and regular consumers rely on their vendors to do this for them and are otherwise at their mercy. Switching these customers directly to continuous deployment sounds harder than it really is. That's because of the anatomy of a release. A typical "new feature" release is, in my experience, about 80% changes to underlying APIs or architecture. That is, the vast majority of the release is not actually visible to the end user. Most of these changes are supposed to be "side-effect free," although few traditional development teams actually achieve that level of quality. So, the first shift in mindset required for continuous deployment is this: if a change is supposedly "side-effect free," release it immediately. Don't wait to bundle it up with a bunch of other related changes. If you do that, it will be much harder to figure out which change caused the unexpected side effects.

The second shift in mindset required is to separate the concept of a marketing release from the concept of an engineering release. Just because a feature is built, tested, integrated, and deployed doesn't mean any customers should necessarily see it. When deploying end-user-visible changes, most continuous deployment teams keep them hidden behind "flags" that allow for a gradual rollout of the feature when it's ready. (See the Flickr blog post at http://code.flickr.com/blog/2009/12/02/flipping-out/ for how that company does this.) This allows the concept of "ready" to be much more all-encompassing than the traditional "developers threw it over the wall to QA, and QA approved of it." You might have the interaction designer who designed it take a look to see if it really conforms to his design. You might have the marketing folks who are going to promote it double-check that it does what they expect. You can train your operations or customer service staff on how it works—all live in the production environment. Although this sounds similar to a staging server, it's actually much more powerful. Because the feature is live in the real production environment, all kinds of integration risks are mitigated. For example, many features have decent performance themselves but interact badly when sharing resources with other features. Those kinds of features can be immediately detected and reverted by continuous deployment. Most importantly, the feature will look, feel, and behave exactly like it does in production. Bugs that are found in production are real, not staging artifacts.

Plus, you want to get good at selectively hiding features from customers. That skill set is essential for gradual rollouts and, most importantly, A/B split-testing (http://www.startuplessonslearned.com/2008/12/getting-started-with-split-testing.html). In traditional large batch deployment systems, split-testing a new feature seems like considerably more work than just throwing it over the wall. Continuous deployment changes that calculus, making split-tests nearly free. As a result, the amount of validated learning (http://www.startuplessonslearned.com/2009/04/validated-learning-about-customers.html) a continuous deployment team achieves per unit time is much higher.

The QA Dilemma

A traditional QA process works through a checklist of key features, making sure each feature works as specified before allowing the release to go forward. This makes sense, especially given how many bugs in software involve "action at a distance" or unexpected side effects. Thus, even if a release is focused on changing Feature X, there's every reason to be concerned that it will accidentally break Feature Y. Over time, the overhead of this approach to QA becomes very expensive. As the product grows, the checklist has to grow proportionally. Thus, to get the same level of coverage for each release, the QA team has to grow (or, equivalently, the amount of time the product spends in QA has to grow). Unfortunately, it gets worse. In a successful start-up, the development team is also growing. That means more changes are being implemented per unit time as well, which means either the number of releases per unit time is growing or, more likely, the number of changes in each release is growing. So, for a growing team working on a growing product, the QA overhead is increasing polynomially, even if the team is expanding only linearly.

For organizations that have the highest quality standards, and the budget to do it, full coverage can work. In fact, that's what happens for organizations such as the U.S. Army, which has to do a massive amount of integration testing of products built by its vendors. Having those products fail in the field would be unacceptable. To achieve full coverage, the Army has a process for certifying these products. The whole process takes a massive amount of manpower and requires a cycle time that would be lethal for most start-ups (the major certifications take approximately two years). And even the Army recognizes that improving this cycle time would have major benefits.

Very few start-ups can afford this overhead, and so they simply accept a reduction in coverage instead. That solves the problem in the short term, but not in the long term—because the extra bugs that get through the QA process wind up slowing the team down over time, imposing extra "firefighting" overhead, too.

I want to directly challenge the belief that continuous deployment leads to lower-quality software. I just don't believe it. Continuous deployment offers significant advantages over large batch development systems. Some of these benefits are shared by Agile systems which have continuous integration but large batch releases, but others are unique to continuous deployment.

Faster (and better) feedback

Engineers working in a continuous deployment environment are much more likely to get individually tailored feedback about their work. When they introduce a bug, performance problem, or scalability bottleneck, they are likely to know about it immediately. They'll be much less likely to hide behind the work of others, as happens with large batch releases—when a release has a bug it tends to be attributed to the major contributor to that release, even if that association is untrue.

More automation

Continuous deployment requires living the mantra: "Have every problem only once." This requires a commitment to realistic prevention and learning from past mistakes. That necessarily means an awful lot of automation. That's good for QA and for engineers. QA's job gets a lot more interesting when we use machines for what machines are good for: routine repetitive detailed work, such as finding bug regressions.

Monitoring of real-world metrics

To make continuous deployment work, teams have to get good at automated monitoring and reacting to business and customer-centric metrics, not just technical metrics. That's a simple consequence of the automation principle I just mentioned. Huge classes of bugs "work as designed" but cause catastrophic changes in customer behavior. My favorite: changing the checkout button in an e-commerce flow to appear white on a white background. No automated test is going to catch that, but it still will drive revenue to zero. That class of bug will burn continuous deployment teams only once.

Better handling of intermittent bugs

Most QA teams are organized around finding reproduction paths for bugs that affect customers. This made sense in eras where successful products tended to be used by a small number of customers. These days, even niche products—or even big enterprise products—tend to have a lot of man-hours logged by end users. And that, in turn, means that rare bugs are actually quite exasperating. For example, consider a bug that happens only one time in a million uses. Traditional QA teams are never going to find a reproduction path for that bug. It will never show up in the lab. But for a product with millions of customers, it's happening (and it's being reported to customer service) multiple times a day! Continuous deployment teams are much better able to find and fix these bugs.

Smaller batches

Continuous deployment tends to drive the batch size of work down to an optimal level, whereas traditional deployment systems tend to drive it up. For more details on this phenomenon, see "Work in Small Batches" (http://www.startuplessonslearned.com/2009/02/work-in-small-batches.html) and the section on the "batch size death spiral" in "The Principles of Product Development Flow" (http://www.startuplessonslearned.com/2009/07/principles-of-product-development-flow.html).

Get Web Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.