An introduction to immutable infrastructure
Why you should stop managing infrastructure and start really programming it.
Immutable infrastructure (II) provides stability, efficiency, and fidelity to your applications through automation and the use of successful patterns from programming. No rigorous or standardized definition of immutable infrastructure exists yet, but the basic idea is that you create and operate your infrastructure using the programming concept of immutability: once you instantiate something, you never change it. Instead, you replace it with another instance to make changes or ensure proper behavior.
Chad Fowler coined the term “immutable infrastructure” in a 2013 blog post, “Trash Your Servers and Burn Your Code: Immutable Infrastructure and Disposable Components,” but others have spoken about similar ideas. Martin Fowler described phoenix servers in 2012. Greg Orzell, James Carr, Kief Morris, and Ben Butler-Cole, to name a few, have contributed significant thought and work as well.
II requires full automation of your runtime environment. This is only possible in compute environments that have an API over all aspects of configuration and monitoring. Therefore, II can be fully realized only in true cloud environments. It is possible to realize some benefits of II with partial implementations, but the true benefits of efficiency and resiliency are realized with thorough implementation.
Give up on artisanal infrastructure
Historically, we’ve thought of machine uptime and maintenance as desirable because we associate the health of the overall service or application with them. In the data center, hardware is expensive and we need to carefully craft and maintain each individual server to preserve our investments over time. In the cloud, this is an anachronistic perspective and one we should give up on in order to create more resilient, simpler, and ultimately more secure services and applications. Werner Vogels, CTO of Amazon and an early leading thinker on cloud systems, captures this sentiment by imploring us to stop hugging servers (they don’t hug us back).
There are a variety of reasons artisanally maintained infrastructure composed of traditional, long-lived (and therefore mutable) components is insufficient to the task of operating modern, distributed services in the cloud.
- Increasing operational complexity. The rise of distributed service architectures, and the use of dynamic scaling results in vastly more stuff to keep track of. Using mutable maintenance methods for updates or patching configurations across fleets of hundreds or thousands of compute instances is difficult, error-prone, and a time sink.
- Slower deployments, more failures. When infrastructure is comprised of snowflake components resulting from mutable maintenance methods (whether via scripts or configuration management tools), there’s a lot more that can go wrong. Deviating from a straight-from-source-control process means accurately knowing the state of your infrastructure is impossible. Fidelity is lost as infrastructure behaves in unpredictable ways and time is wasted chasing down configuration drift and debugging the runtime.
- Identifying errors and threats in order to mitigate harm. Long-lived, mutable systems rely on identifying error or threat to prevent damage. We now know that this is a Sisyphean undertaking, as the near daily announcements of high profile and damaging enterprise exploits attest. And those are only the ones reported. With II and automated regeneration of compute resources, many errors and threats are mitigated whether they are detected or not.
- Fire drills. Artisanal infrastructure allows us to take shortcuts on automation that come back to bite us in unexpected ways, such as when a cloud provider reboots underlying instances to perform their own updates or patches. If we build and maintain our infrastructure manually, and aren’t in the regular routine of II automation, these events become fire drills.
Immutable infrastructure provides hope
II shares much in common with how nature maintains advanced biological systems, like you and me. The primary mechanism of fidelity in humans is the constant destruction and replacement of subcomponents. It underlies the immune system, which destroys cells to maintain health. It underlies the growth system, which allows different subsystems to mature over time through destruction and replacement. The individual human being maintains a sense of self and intention, while the underlying components are constantly replaced. Systems managed using II patterns are no different.
The benefits of immutable infrastructure are manifold if applied appropriately to your application and have fully automated deployment and recovery methods for your infrastructure.
- Simplifying operations. With fully-automated deployment methods, you can replace old components with new versions to ensure your systems are never far in time from their initial “known-good” state. Maintaining a fleet of instances becomes much simpler with II since there’s no need to track the changes that occur with mutable maintenance methods.
- Continuous deployments, fewer failures. With II, you know what’s running and how it behaves, deploying updates can become routine and continuous, with fewer failures occurring in production. All change is tracked by your source control and Continuous Integration/Continuous Deployment processes.
- Reduces errors and threats. Services are built atop a complex stack of hardware and software, and things do go wrong over time. By automating replacement instead of maintaining instances, we are, in effect, regenerating instances regularly and more often. This reduces configuration drift, vulnerability surface, and level of effort to keep Service Level Agreements. Many of the maintenance fire drills in mutable systems are taken care of naturally.
- Cloud reboot? No problem! With II you know what you have running, and with fully automated recovery methods for your services in place, cloud reboots of your underlying instances should be handled gracefully and with minimal, if any, application downtime.
We have to work very hard to maintain things, and when those things were physical boxes in a rack, this was necessary work because we manually configured hardware. But with logically isolated compute instances that can be instantiated with an API call in an effectively infinite cloud, “maintaining boxes” is an intellectual ball and chain. It ties us to caring about and working on the wrong things. Giving up on them enables you to focus on what matters to the success of your application, rather than being constantly pulled down by high maintenance costs and the difficulty in adopting new patterns.