Building a strong foundation for effective devops requires discussing some key terms and concepts. Some of these concepts may be familiar to readers; many have been mentioned in the preceding history of software engineering or will be well known to readers who have experience with various software development methodologies.
Throughout the history of computer engineering, a number of methodologies have been described to improve and ease the process of software development and operations. Each methodology splits work into phases, each with a distinct set of activities. One issue with many methodologies is a focus on the development process as something separate from operations work, leading to conflicting goals between teams. Additionally, forcing other teams to follow particular methodologies can cause resentment and frustration if the work doesn’t fit their processes and goals. Understanding how these different methodologies work and what benefits each might bring can help increase understanding and reduce this friction.
Devops is not so rigidly defined as to prohibit any particular methodology. While devops arose from practitioners who were advocating for Agile system administration and cooperation between development and operations teams, the details of its practice are unique per environment. Throughout this book, we will reiterate that a key part of devops is being able to assess and evaluate different tools and processes to find the most effective ones for your environment.
These different phases of work may include:
Specification of deliverables or artifacts
Development and verification of the code with respect to the specification
Deployment of the code to its final customers or production environment
Covering all methodologies is far beyond the scope of this chapter, but we will touch on a few that have in one way or another impacted the ideas behind devops.
The waterfall methodology or model is a project management process with an emphasis on a sequential progression from one stage of the process to the next. Originating in the manufacturing and construction industries and adopted later by hardware engineering, the waterfall model was adapted to software in the early 1980s.1
The original stages were requirements specification, design, implementation, integration, testing, installation, and maintenance, and progress was visualized as flowing from one stage to another (hence the name), as shown in Figure 4-1.
Software development under the waterfall model tended to be very highly structured, based on a large amount of time being spent in the requirements and design phases, with the idea that if both of those were completed correctly to begin with it would cut down on the number of mistakes found later.
In waterfall’s heyday, there was a high cost to delivering software on CD-ROMs or floppy disks, not including the cost to customers for manual installation. Fixing a bug required manufacturing and distributing new floppies or CD-ROMs. Because of these costs, it made sense to spend more time and effort specifying requirements up front rather than trying to fix mistakes later.
Agile is the name given to a group of software development methodologies that are designed to be more lightweight and flexible than previous methods such as waterfall. The Agile Manifesto, written in 2001 and described in the previous chapter, outlines its main principles as follows:
We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:
- individuals and interactions over processes and tools
- working software over comprehensive documentation
- customer collaboration over contract negotiation
- responding to change over following a plan
That is, while there is value in the items on the right, we value the items on the left more.
Agile methodologies include processes such as Scrum, which we will define next, and other methods that place a heavy emphasis on collaboration, flexibility, and the end result of working software.
Devops shares many characteristics with the Agile movement, especially with the focus on individuals, interactions, and collaboration. You might wonder if devops is just “rebranded” Agile. While devops has certainly grown around Agile methodology, it is a separate cultural movement steeped in the history of the computing industry with a broad reach that includes more than just developers. Devops adopts and extends Agile principles and applies them to the entire organization, not only the development process. As we will see in detail in later chapters, devops has cultural implications beyond Agile and a focus that is broader than speed of delivery.
In the mid-1990s, Ken Schwaber and Dr. Jeff Sutherland, two of the original creators of the Agile Manifesto, merged individual efforts to present a new software development process called Scrum. Scrum is a software development methodology that focuses on maximizing a development team’s ability to quickly respond to changes in both project and customer requirements. It uses predefined development cycles called sprints, usually between one week and one month long, beginning with a sprint planning meeting to define goals and ending with a sprint review and sprint retrospective to discuss progress and any issues that arose during that sprint.
What did I do yesterday that helped the team meet its sprint goals?
What am I planning to do today to help the team meet those goals?
What, if anything, do I see that is blocking either me or the team from reaching their goals?
These meetings, which take place in the morning in order to help people align with what they are planning to do that day and help each other with any blocking issues, are often facilitated by the Scrum master. The Scrum master is an important role that also includes responsibilities such as helping the team self-organize and coordinate work efforts, helping remove blockers so the team will continue making progress, and involving project owners and stakeholders so there is a shared understanding of what “done” means and what progress is being made. The principles of Scrum are often seen applied less formally in many software development practices today.
Similar to how software development methodologies split up software development work into different phases or otherwise try to bring more order to those processes, IT or operations work can be split up or organized as well. As with the software methodologies, covering all methodologies is far beyond the scope of this chapter.
ITIL, formerly known as Information Technology Infrastructure Library, is a set of practices defined for managing IT services. It is published as a series of five volumes that describe its processes, procedures, tasks, and checklists, and is used to demonstrate compliance as well as measure improvement toward that end. ITIL grew out of a trend that saw the growing number of IT organizations in the 1980s using an increasingly diverse set of practices.
The British Central Computer and Telecommunications Agency developed the set of recommendations as a way to try to standardize these practices. First published in 1989, the books and practices have grown over the years, with the five core sections in the most recent (2011) version being service strategy, service design, service transition, service operation, and continual service improvement.
IT analyst and consultant Stephen Mann notes that while there are many benefits that come with ITIL’s standardization and there are over 1.5 million ITIL-certified people worldwide, it has some areas where practitioners might want to put additional focus. Mann has said that ITIL is often more on the side of being reactive rather than proactive, so we suggest that organizations that have been using ITIL take note of ways that they can try to add more proactive planning and customer focus to their practices.
Control Objectives for Information and Related Technology (COBIT) is an ISACA framework for governance and management of information and technology first released in 1996. A core principle of COBIT is to align business goals with IT goals.
COBIT is based on 5 principles:
meeting stakeholder needs;
covering the enterprise from end to end;
applying a single integrated framework;
enabling a holistic approach; and
separating governance from management.
Some methodologies focus on thinking about systems as a whole, rather than limiting focus to more specific areas such as software development or IT operations. Systems thinking skills are crucial for anyone working with complex systems like many of the software products that are created today; readers interested in learning more about systems thinking in general would do well to read Thinking in Systems by Donella Meadows and How Complex Systems Fail by Dr. Richard Cook.
After a five-year study on the future of automobile production and the Toyota Production System (TPS), James P. Womack, Daniel T. Jones, and Daniel Roos coined the term Lean Production.2 Womack and Jones defined the five principles of Lean Thinking as follows:3
These ideas, especially the pursuit of perfection through systemic identification and elimination of waste, drove the definition of Lean as the maximization of customer value and minimization of waste.
Lean systems focus on the parts of the system that add value by eliminating waste everywhere else, whether that be overproduction of some parts, defective products that have to be rebuilt, or time spent waiting on some other part of the system. Stemming from this are the concepts of Lean IT and Lean software development, which apply these same concepts to software engineering and IT operations.
Waste to be eliminated in these areas can include:
Unnecessary software features
Slow application response times
Overbearing bureaucratic processes
Waste in the context of Lean is the opposite of value. Mary Poppendieck and Thomas Poppendieck have mapped Lean manufacturing waste to software development waste as follows:4
Partially done work
As with devops, there is no one way to do Lean software development. There are two main approaches to Lean: a focus on waste elimination through a set of tools, and a focus on improving the flow of work, also known as The Toyota Way.5 Both approaches have the same goal, but due to the differing approaches may result in different outcomes.
There are several terms related to the development, release, and deployment of software that have not previously been covered in the definitions of the methodologies discussed so far in this chapter. These are concepts that describe the hows of developing and deploying software, and understanding what they are and how they relate will give readers a more mature understanding of how tools can be used to facilitate these practices down the line.
A version control system records changes to files or sets of files stored within the system. This can be source code, assets, and other documents that may be part of a software development project. Developers make changes in groups called commits or revisions. Each revision, along with metadata such as who made the change and when, is stored within the system in one way or another.
Having the ability to commit, compare, merge, and restore past revisions to objects to the repository allows for richer cooperation and collaboration within and between teams. It minimizes risks by establishing a way to revert objects in production to previous versions.
In test-driven development, the code developer starts by writing a failing test for the new code functionality, then writes the code itself, and finally ensures that the test passes when the code is complete. The test is a way of defining the new functionality clearly, making more explicit what the code should be doing.
Having developers write these tests themselves not only greatly shortens feedback loops but also encourages developers to take more responsibility for the quality of the code they are writing. This sharing of responsibility and shorter development cycle time are themes that continue to be important parts of a devops culture.
Application deployment is the process of planning, maintaining, and executing on the delivery of a software release. In the general sense, the craft of application deployment needs to consider the changes that are taking place underneath the system. Having infrastructure automation build the dependencies required to run a specific application—whether they be compute, operating system, or other dependencies—minimizes the impact of inconsistencies on the released software.
Depending on the application type, different engineering concerns may be important. For example, databases may have strict guarantees in terms of consistency. If a transaction occurs, it must be reflected in the data. Application deployment is a critical aspect to engineering quality software.
Continuous integration (CI) is the process of integrating new code written by developers with a mainline or “master” branch frequently throughout the day. This is in contrast to having developers working on independent feature branches for weeks or months at a time, merging their code back to the master branch only when it is completely finished. Long periods of time in between merges means that much more has been changed, increasing the likelihood of some of those changes being breaking ones. With bigger changesets, it is much more difficult to isolate and identify what caused something to break. With small, frequently merged changesets, finding the specific change that caused a regression is much easier. The goal is to avoid the kinds of integration problems that come from large, infrequent merges.
In order to make sure that the integrations were successful, CI systems will usually run a series of tests automatically upon merging in new changes. When these changes are committed and merged, the tests automatically start running to avoid the overhead of people having to remember to run them—the more overhead an activity requires, the less likely it is that it will get done, especially when people are in a hurry. The outcome of these tests is often visualized, where “green” means the tests passed and the newly integrated build is considered clean, and failing or “red” tests means the build is broken and needs to be fixed. With this kind of workflow, problems can be identified and fixed much more quickly.
Continuous delivery (CD) is a set of general software engineering principles that allow for frequent releases of new software through the use of automated testing and continuous integration. It is closely related to CI, and is often thought of as taking CI one step further, that beyond simply making sure that new changes can be integrated without causing regressions to automated tests, continuous delivery means that these changes can be deployed.
Continuous deployment (also referred to as CD) is the process of deploying changes to production by defining tests and validations to minimize risk. While continuous delivery makes sure that new changes can be deployed, continuous deployment means that they get deployed into production.
The more quickly software changes make it into production, the sooner individuals see their work in effect. Visibility of work impact increases job satisfaction, and overall happiness with work, leading to higher performance. It also provides opportunities to learn more quickly. If something is fundamentally wrong with a design or feature, the context of work is more recent and easier to reason about and change.
Continuous deployment also gets the product out to the customer faster, which can mean increased customer satisfaction (though it should be noted that this is not a panacea—customers won’t appreciate getting an updated product if that update doesn’t solve any of their problems, so you have to make sure through other methods that you are building the right thing). This can mean validating its success or failure faster as well, allowing teams and organizations to iterate and change more rapidly as needed.
The difference between Continuous Delivery and Continuous Deployment is one that has been discussed a great deal since these topics became more widely used. Jez Humble, author of Continuous Delivery defines continuous delivery as being a general set of principles that can be applied to any software development project, including the internet of things (IoT) and embedded software, while continuous deployment is specific to web software. For more information on the differences between these two concepts, see the Further Resources for this chapter.
One theme that has become apparent especially in recent years is the idea of reducing both development costs and waste associated with creating products. If an organization were to spend years bringing a new product to market only to realize after the fact that this new product didn’t meet the needs of either new or existing customers, that would have been an incredible waste of time, energy, and money.
The idea of the minimum viable product (MVP) is to create a prototype of a proposed product with the minimum amount of effort required to determine if the idea is a good one. Rather than developing something to 100 percent completion before getting it into users’ hands, the MVP aims to drastically reduce that amount, so that if significant changes are needed, less time and effort has already been spent. This might mean cutting down on features or advanced settings in order to evaluate the core concept, or focusing on features rather than design or performance. As with ideas such as Lean and continuous delivery, MVPs allow organizations to iterate and improve more quickly while reducing cost and waste.
All computer software runs on infrastructure of some sort, whether that be hardware that an organization owns and manages itself, leased equipment that is managed and maintained by someone else, or on-demand compute resources that can easily scale up or down as needed. These concepts, once solely the realm of operations engineers, are important for anyone involved with a software product to understand in environments where the lines between development and operations are starting to blur.
Started in the 1950s by the United States Department of Defense as a technical management discipline, configuration management (CM) has been adopted in many industries. Configuration management is the process of establishing and maintaining the consistency of something’s functional and physical attributes as well as performance throughout its lifecycle. This includes the policies, processes, documentation, and tools required to implement this system of consistent performance, functionality, and attributes.
Specifically within the software engineering industry, various organizations and standards bodies such as ITIL, IEEE (the Institute of Electrical and Electronics Engineers), ISO (the International Organization for Standardization), and SEI (the Software Engineering Institute) have all proposed a standard for configuration management. As with other folk models, this has led to some confusion in the industry about a common definition for the term.
Often this term is conflated with various forms of infrastructure automation, version control, or provisioning, which creates a divide with other disciplines’ usage of the term. To ensure a common understanding for this book’s audience, we define configuration management as the process of identifying, managing, monitoring, and auditing a product through its entire lifecycle, including the processes, documentation, people, tools, software, and systems involved.
Cloud computing, often referred to as just “the cloud,” refers to a type of shared, internet-based computing where customers can purchase and use shared computing resources offered by various cloud providers as needed. Cloud computing and storage solutions can spare organizations the overhead of having to purchase, install, and maintain their own hardware.
The combination of high performance, cost savings, and the flexibility and convenience that many cloud solutions offer has made the cloud an ideal choice for organizations that are looking to both minimize costs and increase the speed at which they can iterate. Iteration and decreased development cycle time are key factors in creating a devops culture.
While some see the cloud as being synonymous with devops, this is not universally the case. A key part of devops is being able to assess and evaluate different tools and processes to find the most effective one for your environment, and it is absolutely possible to do that without moving to cloud-based infrastructure.
Infrastructure automation is a way of creating systems that reduces the burden on people to manage the systems and their associated services, as well as increasing the quality, accuracy, and precision of a service to its consumers. Indeed, automation in general is a way to cut down on repetitious work in order to minimize mistakes and save time and energy for human operators.
For example, instead of running the same shell commands by hand on every server in an organization’s infrastructure, a system administrator might put those commands into a shell script that can be executed by itself in one step rather than many smaller ones.
An artifact is the output of any step in the software development process. Depending on the language, artifacts can be a number of things, including JARs (Java archive files), WARs (web application archive files), libraries, assets, and applications. Artifact management can be as simple as a web server with access controls that allow file management internal to your environment, or it can be a more complex managed service with a variety of extended features. Much like early version control for source code, artifact management can be handled in a variety of ways based on your budgetary concerns.
Generally, an artifact repository can serve as:
a central point for management of binaries and dependencies;
a configurable proxy between organization and public repositories; and
an integrated depot for build promotions of internally developed software.
One of the bigger pain points that has traditionally existed between development and operations teams is how to make changes rapidly enough to support effective development but without risking the stability of the production environment and infrastructure. A relatively new technology that helps alleviate some of this friction is the idea of software containers—isolated structures that can be developed and deployed relatively independently from the underlying operating system or hardware.
Similar to virtual machines, containers provide a way of sandboxing the code that runs in them, but unlike virtual machines, they generally have less overhead and less dependence on the operating system and hardware that support them. This makes it easier for developers to develop an application in a container in their local environment and deploy that same container into production, minimizing risk and development overhead while also cutting down on the amount of deployment effort required of operations engineers.
The final concepts we define in this chapter are cultural ones. While some software development methodologies, such as Agile, define ways in which people will interact while developing software, there are more interactions and related cultural concepts that are important to cover here, as these ideas will come up later in this book.
A retrospective is a discussion of a project that takes place after it has been completed, where topics such as what went well and what could be improved in future projects are considered. Retrospectives usually take place on a regular (if not necessarily frequent) basis, either after fixed periods of time have elapsed (every quarter, for example) or at the end of projects. A big goal is local learning—that is, how the successes and failures of this project can be applied to similar projects in the future. Retrospective styles may vary, but usually include topics of discussion such as:
What the scope of the project was and what ended up being completed.
Ways in which the project succeeded, features that the team is especially proud of, and what should be used in future projects.
Things that went wrong, bugs that were encountered, deadlines that were missed, and things to be avoided in future projects.
Unlike the planned, regular nature of a retrospective, a postmortem occurs after an unplanned incident or outage, for cases where an event’s outcome was surprising to those involved and at least one failure of the system or organization was revealed. Whereas retrospectives occur at the end of projects and are planned in advance, postmortems are unexpected before the event they are discussing. Here the goal is organizational learning, and there are benefits to taking a systemic and consistent approach to the postmortem by including topics such as:
A timeline of the incident from start to finish, often including communication or system error logs.
Every person involved in the incident gives their perspective on the incident, including their thinking during the events.
Things that should be changed to increase system safety and avoid repeating this type of incident.
In the devops community, there is a big emphasis placed on postmortems and retrospectives being blameless. While it is certainly possible to have a blameful postmortem that looks for the person or people “responsible” for an incident in order to call them out, that runs counter to the focus on learning that is central to the devops movement.
Blamelessness is a concept that arose in contrast to the idea of blame culture. Though it had been discussed for years previously by Sidney Dekker and others, this idea really came to prominence with John Allspaw’s post on blameless postmortems, with the idea that incident retrospectives would be more effective if they focused on learning rather than punishment.
A culture of blamelessness exists not as a way of letting people off the hook, but to ensure that people feel comfortable coming forward with details of an incident, even if their actions directly contributed to a negative outcome. Only with all the details of how something happened can learning begin to occur.
Karen E. Watkins and Victoria J. Marsick, Partners for Learning
Organizational learning is the process of collecting, growing, and sharing an organization’s body of knowledge. A learning organization is one that has made their learning more deliberate, setting it as a specific goal and taking actionable steps to increase their collective learning over time.
Organizational learning as a goal is part of what separates blameful cultures from blameless ones, as blameful cultures are often much more focused on punishment than on learning, whereas a blameless or learning organization takes value from experiences and looks for lessons learned and knowledge gained, even from negative experiences. Learning can happen at many different levels, including individual and group as well as organization, but organizational learning has higher impact to companies as a whole, and companies that practice organizational learning are often more successful than those that don’t.
We have discussed a variety of methodologies relating to the development, deployment, and operation of both software and the infrastructure that underlies it, as well as cultural concepts addressing how individuals and organizations deal with and learn from incidents and failures.
This is far from an exhaustive list, and new methodologies and technologies will be developed in the future. The underlying themes of development, deployment, operations, and learning will continue to be core to the industry for years to come.
2 James P. Womack, Daniel T. Jones, and Daniel Roos, The Machine That Changed the World (New York: Rawson Associates, 1990).
3 James P. Womack and Daniel T. Jones, Lean Thinking (New York: Simon & Schuster, 1996).
4 Mary Poppendieck and Thomas David Poppendieck. Implementing Lean Software Development (Upper Saddle River, NJ: Addison-Wesley, 2007).
5 Jeffrey K. Liker, The Toyota Way: 14 Management Principles from the World’s Greatest Manufacturer (New York: McGraw-Hill, 2004).