Environment parity for rapidly deployed cloud-native apps

How to instill confidence that your app will work everywhere it’s meant to by reducing gaps in time, people, and resources.

By Kevin Hoffman

August 31, 2016

Ventilator (source: Pixabay)

In Beyond the Twelve-Factor App, I present a new set of guidelines that builds on Heroku’s original 12 factors and reflects today’s best practices for building cloud-native applications. I have changed the order of some to indicate a deliberate sense of priority and added factors such as telemetry, security, and the concept of “API first” that should be considerations for any application that will be running in the cloud. These new 15-factor guidelines are:

One codebase, one application
API first
Dependency management
Design, build, release, and run
Configuration, credentials, and code
Logs
Disposability
Backing services
Environment parity
Administrative processes
Port binding
Stateless processes
Concurrency
Telemetry
Authentication and authorization

The tenth of the original 12 factors, dev/prod parity, instructed us to keep all of our environments as similar as possible.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

While some organizations have done more evolving, many of us have likely worked in an environment like this: the shared development environment has a different scaling and reliability profile than QA, which is also different than production. The database drivers used in dev and QA are different than production. Security rules, firewalls, and other environmental configuration settings are also different. Some people have the ability to deploy to some environments, but not others. And finally, the worst part of it all, is people fear deployment, and they have little to no confidence that if the product works in one environment, it will work in another.

When discussing the design, build, release, run cycle, I brought up the notion that the “It works on my machine” scenario is a cloud-native anti-pattern. The same is true for other phrases we’ve all heard right before losing hours or days to firefighting and troubleshooting: “It works in QA” and “It works in prod.”

The purpose of applying rigor and discipline to environment parity is to give your team and your entire organization the confidence that the application will work everywhere. ¹

While the opportunities for creating a gap between environments are nearly infinite, the most common culprits are usually:

Time
People
Resources

Time

In many organizations, it could take weeks or months from the time a developer checks in code until the time it reaches production. In organizations like this, you often hear phrases like “the third-quarter release” or “the December 20xx release.” Phrases like that are a warning sign to anyone paying attention.

When such a time gap occurs, people often forget what changes went into a release (even if there are adequate release notes), and more importantly, the developers have forgotten what the code looked like.

Adopting a modern approach, organizations should strive to reduce the time gap from check-in to production, taking it from weeks or months to minutes or hours. The end of a proper CD pipeline should be the execution of automated tests in different environments until the change is automatically pushed to production. With the cloud supporting zero-downtime deployment, this pattern can become the norm.

This idea often scares people, but once developers get into the habit of knowing their code will be in production the same day as a check in, discipline and code quality often skyrocket.

People

Historically, the types of people deploying applications were directly related to the size of the company: In smaller companies, developers are usually involved in everything from coding through deployment; whereas in larger organizations, there are more handoffs, and more people and teams involved.

The original 12 factors indicate that the developers and deployers should be the same people, and this makes a lot of sense if your target is a black-box public cloud like Heroku; but this practice falls down when your target is a private cloud within a large enterprise.

Further, I contend that humans should never be deploying applications at all, at least not to any environment other than their own workstations or labs. In the presence of a proper build pipeline, an application will be deployed to all applicable environments automatically and can manually be deployed to other environments based on security restrictions within the CI tool and the target cloud instance.

In fact, even if you are targeting a public cloud provider, it is still possible to use cloud-hosted CD tools like CloudBees or Wercker to automate your testing and deployments.

While there are always exceptions, I contend that if you cannot deploy with a single press of a button, or automatically in response to certain events, then you’re doing it wrong.

Resources

When we’re sitting at our desks and we need to get something up and running quickly, we all make compromises. The nature of these compromises can leave us with a little bit of technical debt, or it can set us up for catastrophic failure.

One such compromise is often in the way we use and provision backing services. Our application might need a database, and we know that in production we’ll be hooking it up to an Oracle or a Postgres server, but it’s too much of a pain to set that up to be available locally for development, so we’ll compromise and use an in-memory database that is kind of like the target database.

Every time we make one of these compromises, we increase the gap between our development and production environments; and the wider that gap is, the less predictability we have about the way our application works. As predictability goes down, so does reliability; and if reliability goes down, we lose the ability to have a continuous flow from code check-in to production deployment. It adds a sense of brittleness to everything we do; and the worst part is, we usually don’t know the consequences of increasing the dev/prod gap until it’s too late.

These days, developers have a nearly infinite set of tools at their disposal. There are so few good excuses left for not using the same resource types across environments. Developers can be granted their own instances of databases (this is especially easy if the database is itself a brokered service on a PaaS instance), or if that’s not an option, container tools like Docker can help make “prod like” environments more accessible to developer workstations.

As you evaluate every step in your development life cycle while building cloud-native applications, every decision that increases the functional gap between your deployment environments needs to be flagged and questioned, and you need to resist the urge to mitigate this problem by allowing your environments to differ, even if the difference seems insignificant at the time.

Every commit is a candidate for deployment

Every commit is a candidate for deployment. When building applications in a cloud-first way, every time you commit a change, that change should end up in production after some short period of time: basically the amount of time it takes to run all tests, vet that change against all integration suites, and deploy to pre-production environments.

If your development, testing, and production environments differ, even in ways you might think don’t matter, then you lose the ability to accurately predict how your code change is going to behave in production. This confidence in the code heading to production is essential for the kind of continuous delivery, rapid deployment that allows applications and their development teams to thrive in the cloud.

Post topics: Operations