“When deploying and administering large infrastructures, it is still common to think in terms of individual machines rather than view an entire infrastructure as a combined whole. This standard practice creates many problems, including labor-intensive administration, high cost of ownership, and limited generally available knowledge or code usable for administering large infrastructures.” | ||
--Steve Traugott and Joel Huddleston, TerraLuna LLC |
“In today’s computer industry, we still typically install and maintain computers the way the automotive industry built cars in the early 1900s. An individual craftsman manually manipulates a machine into being, and manually maintains it afterwards. The automotive industry discovered first mass production, then mass customisation using standard tooling. The systems administration industry has a long way to go, but is getting there.” | ||
--Steve Traugott and Joel Huddleston, TerraLuna LLC |
These two statements came from the prophetic infrastructures.org website at the very start of the last decade. Nearly ten years later, a whole world of exciting developments have taken place, which have sparked a revolution, and given birth to a radical new approach to the process of designing, building and maintaining the underlying IT systems that make web operations possible. At the heart of that revolution is a mentality and a tool set that treats Infrastructure as Code.
This book is written from the standpoint that this approach to the designing, building, and running of Internet infrastructures is fundamentally correct. Consequently, we’ll spend a little time exploring its origin, rationale, and principles before outlining the risks of the approach—risks which this book sets out to mitigate.
Infrastructure as Code is an interesting phenomenon, particularly for anyone wanting to understand the evolution of ideas. It emerged over the last four or five years in response to the juxtaposition of two pieces of disruptive technology [1]—utility computing, and second-generation web frameworks.
The ready availability of effectively infinite compute power, at the touch of a button, combined with the emergence of a new generation of hugely productive web frameworks brought into existence a new world of scaling problems that had previously only been witnessed by large systems integrators. The key year was 2006, which saw the launch of Amazon Web Services’ Elastic Compute Cloud (EC2), a few months after the release of version 1.0 of Ruby on Rails the previous Christmas. This convergence meant that anyone with an idea for a dynamic website—an idea which delivered functionality or simply amusement—to a rapidly growing Internet community, could go from a scribble on the back of a beermat to a household name in weeks.
Suddenly very small, developer-led companies found themselves facing issues that were previously tackled almost exclusively by large organizations with huge budgets, big teams, enterprise-class configuration management tools, and lots of time. The people responsible for these websites that had gotten huge almost overnight now had to answer questions such as how to scale read or write-heavy databases, how to add identical machines to a given layer in the architecture, and how to monitor and back up critical systems. Radically small teams needed to be able to manage infrastructures at scale, and to compete in the same space as big enterprises, but with none of the big enterprise systems.
It was out of this environment that a new breed of configuration management tools emerged.[2] Given the significance of 2006 in terms of the disruptive technologies we describe, it’s no coincidence that in early 2006 Luke Kanies published an article on “Next-Generation Configuration Management”[3] in ;login: (the USENIX magazine), describing his Ruby-based system management tool, Puppet. Puppet provided a high level DSL with primitive programmability, but the development of Chef (a tool influenced by Puppet, and released in January 2009) brought the power of a 3GL programming language to system administration. Such tools equipped tiny teams and developers with the kind of automation and control that until then had only been available to the big players. Furthermore, being built on open source tools and released early to developer communities, these tools rapidly began to evolve according to demand, and arguably soon started to become even more powerful than their commercial counterparts.
Thus a new paradigm was introduced—the paradigm of Infrastructure as Code. The key concept is that it is possible to model our infrastructure as if it were code—to abstract, design, implement, and deploy the infrastructure upon which we run our web applications in the same way, and to work with this code using the same tools, as we would with any other modern software project. The code that models, builds, and manages the infrastructure is committed into source code management alongside the application code. The mindshift is in starting to think about our infrastructure as redeployable from a code base; a code base that we can work with using the kinds of software development methodologies that have developed over the last dozen or more years as the business of writing and delivering software has matured.
This approach brings with it a series of benefits that help the small, developer-led company to solve some of the scalability and management problems that accompany rapid and overwhelming commercial success:
- Repeatability
Because we’re building systems in a high-level programming language, and committing our code, we start to become more confident that our systems are ordered and repeatable. With the same inputs, the same code should produce the same outputs. This means we can now be confident (and ensure on a regular basis) that what we believe will recreate our environment really will do that.
- Automation
We already have mature tools for deploying applications written in modern programming languages, and the very act of abstracting out infrastructures brings us the benefits of automation.
- Agility
The discipline of source code management and version control means we have the ability to roll forwards or backwards to a known state. In the event of a problem, we can go to the commit logs and identify what changed and who changed it. This brings down the average time to fix problems, and encourages root cause analysis.
- Scalability
Repeatability plus automation makes scalability much easier, especially when combined with the kind of rapid hardware provisioning that the cloud provides.
- Reassurance
The fact that the architecture, design, and implementation of our infrastructure is modeled in code means that we automatically have documentation. Any programmer can look at the source code and see at a glance how the systems work. This is a welcome change from the common scenario in which only a single sysadmin or architect holds the understanding of how the system hangs together. That is risky—this person is now able to hold the organization ransom, and should they leave or become ill, the company is endangered.
- Disaster recovery
In the event of a catastrophic event that wipes out the production systems, if your entire infrastructure has been broken down into modular components and described as code, recovery is as simple as provisioning new compute power, restoring from backup, and deploying the infrastructure and application code. What may have been a business-ending event in the old paradigm of custom-built, partially-automated infrastructure becomes a manageable few-hour outage, potentially delivering competitive value over those organizations suffering from the same external influences, but without the power and flexibility brought about by Infrastructure as Code.
Infrastructure as Code is a powerful concept and approach that promises to help repair the split-brain witnessed so frequently in organizations where developers and system administrators view each other as enemies, and don’t work together. By giving operational responsibilities to developers, and liberating system administrators to start thinking at the higher levels of abstraction that are necessary if we’re to succeed in building robust scaled architectures, we open up a new way of cooperating, a new way of working—which is fundamental to the emerging Devops movement.
[1] Joseph L. Bower and Christensen, Clayton M, “Disruptive Technologies: Catching the Wave,” Harvard Business Review January–February 1995.
[2] Although open source configuration management tools already existed, specifically CFengine, frustration with these existing tools contributed to the creation of Puppet.
Get Test-Driven Infrastructure with Chef now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.