Chapter 4. Process

As you’ve seen throughout the book so far, legacy processes can stifle agility and innovation because they were designed in a different era, with different constraints, when the infrastructure you ran your software on could be seen and touched.

Cloud practitioners need to rethink how we work. As you learned in Chapter 1, we also need to build trust into our systems so we can get rid of all those low-value, time-consuming review gates blocking our way.

But where do we start?

A company I’ll call NCC Enterprises outsourced its infrastructure provisioning and management to a third party. In the datacenter, the SLA for procuring and provisioning new infrastructure was three months. The process required several forms and approvals before an order was placed. Once the infrastructure equipment the vendor had ordered arrived, NCC put the provisioning process in a backlog and completed all of its higher-priority provisioning jobs first. Thus, the SLA contained a lot of padding to account for large queues and emergency requests that might take a higher priority. Sometimes, if you got lucky, you might get your infrastructure installed in two months, but often it was three or more.

When NCC decided to start leveraging the cloud, the provisioning process did not change. The requesters still had to fill out forms and get approvals. The ticket was still processed by the third party and fell into the same queue as the physical infrastructure, even though installing the infrastructure was only a matter of running a script. Several months later, some of NCC’s business leaders started questioning the value of cloud computing because they had not seen an improvement in turnaround time.

The Software Development Life Cycle

To understand the breadth of processes that make up the software development life cycle (SDLC), let’s look at an Information Technology Infrastructure Library (ITIL) framework diagram (Figure 4-1).1 It’s a great visual representation of what goes into building and running software.

ITIL framework (courtesy of ITIL)
Figure 4-1. ITIL framework, courtesy of ITIL

As you can see, there are lots of steps in the SDLC. Going through them all would be well outside the scope of this book, so in this chapter I’ll focus on service transition and service operations, two of the SDLC steps that require the most change to optimize for cloud. These services can particularly hinder cloud adoption if the legacy processes are not reengineered for the cloud, and they have an enormous impact on the ability to deliver software to cloud endpoints.

In the legacy model, each process in Figure 4-1 is usually owned by a group. Each group has a process flow, or a sequence of processes, for receiving, processing, and completing requests. If you add up all the boxes, it becomes evident that there is a lot of process to navigate to get software out the door. Figure 4-2 shows a suboptimal process flow for building and deploying software. You can see plenty of manual processes, handoffs, and review gates.

Suboptimal software-deployment process flow
Figure 4-2. Suboptimal software-deployment process flow

Automation can greatly reduce the time it takes to deploy software. In Figure 4-3, you can see that many of those manual processes, handoffs, and review gates have been replaced by high levels of automation in the CI/CD pipeline. Where companies often go wrong is that they put little thought into redesigning those existing processes to optimize the flow of work. They try to simply automate their existing processes, but end up automating all of the bottlenecks and waste within them.

An automated CI/CD pipeline
Figure 4-3. An automated CI/CD pipeline

This kind of automation effort ensures that deploying software in the cloud will not go well. It’s the equivalent of the VP of Electricity from Chapter 1 making everyone fill out forms and obtain permissions instead of just putting a plug into an outlet. Sure, the VP of Electricity still needs to provide high SLAs for the electricity services and to make sure there are redundant sources of power, but that should all be abstracted from the power consumers, who only care that the outlet in the wall works.

Just as consuming electricity as a service requires a different model than producing it with your own generators and turbines, consuming computing as a service requires a different model. If you don’t acknowledge and embrace the need to rethink processes, you’re setting up your enterprise’s cloud adoption for failure.

Under service transition, there are a few processes that can easily become huge bottlenecks if they are not optimized for the cloud. The first is change management, the process of identifying changes to both infrastructure and software. The point of change management is to understand the impacts, dependencies, and risks associated with the changes and address them to minimize disruption to services. Release and deployment management is the process of ensuring the integrity of the production environment and confirming that the correct components are released. This includes planning, building, testing, deploying, and accessing the software and production environments. Service validation and testing assesses the impact and benefits of the changes, to ensure operations can support the new services.

Under service operations, the potential major bottleneck areas are access management (determining permissions and limiting access to authorized users only), incident management (restoring services to the users as quickly as possible when systems become unreliable), and problem management (analyzing incidents to prevent new ones). I’ll discuss the kinds of optimization required in Part II of the book.

This chapter will discuss methods of reengineering processes to improve the flow of work and reduce process bottlenecks. I’ll introduce the concept of value stream mapping (VSM), which is a method used to analyze existing processes and redesign them for optimal flow. VSM is a Lean management tool commonly used by companies that are far along in their DevOps maturity.

Process optimization, also called process reengineering, can be a daunting task. Changing processes can challenge cultural norms, impact existing roles and responsibilities, and require structural changes to the organization to optimize flow. Changes in process can be met with resistance, especially in workplace cultures where change is infrequent or even unwelcome.

My advice is to start at the biggest pain point. If your organization is early in its cloud journey, that’s usually one of four areas: environment provisioning, deployment processes, incident management, and security processes.

I’ll look at each of these four process areas in turn, then examine how value stream mapping can help you optimize them for cloud.

Environment Provisioning

In the public cloud, infrastructure is code, not physical machines or appliances. Provisioning infrastructure and environments in the public cloud can be accomplished in minutes. Obviously, you need to make sure infrastructure is provisioned in a cost-effective, secure, and compliant manner—but that shouldn’t add days, weeks, or months to the timeline. So how do you balance control with agility? Let me illustrate with the experience of one of my consulting clients. Here’s how a legacy process almost killed a large media company’s cloud initiative.

MediaCo used to be a traditional waterfall shop: it progressed through the SDLC one phase at a time. All test environments were run on-premises on shared physical infrastructure in the datacenter. Two system administrators were responsible for maintaining the test environments. With the start of each new release cycle, they had to wipe all test environments clean and refresh the databases. They scheduled this process for the end of each month, which forced project teams to schedule their sprints and releases around the refresh events. If there were any unplanned outages or if emergency fixes or patches were needed, none of the development teams could test until the test environments were brought back online.

This process worked well for MediaCo for many years because its development teams released biannually or quarterly. There was plenty of time to plan and coordinate the refresh process. But as they became more mature in practicing Agile, development teams were moving to more frequent releases, putting more strain on the two system administrators who had to keep up with all of their requests.

At the same time, MediaCo was looking at ways to leverage the public cloud, and the dynamic nature of the test environments was an attractive use case. So it decided to migrate its test environments to the public cloud. The problem was, the infrastructure team viewed the public cloud as just “someone else’s datacenter” and tried to apply the exact same tools and processes they’d used in their own datacenter.

To make matters worse, the business units declared that they no longer wanted to work in a shared environment because they were tired of being delayed by other project teams’ schedules and testing cycles. This created even more work for the system administrators, who would now have to manage multiple infrastructure environments.

MediaCo was heading for a failed cloud migration—until my team challenged them to rethink their business processes.

Processes don’t come from nowhere. They are created to fulfill a need: a set of requirements, as set forth in company policies. When we asked the MediaCo administrators what the actual policies were that drove their process, they kept referring to the process itself. They were focused on the “how,” not the “why.” Once we got them to stop thinking about the existing process and start defining their actual requirements, which were mostly driven by the security and compliance teams, things started getting easier.

The real requirements were:

  • Test environments must be refreshed on a defined interval.

  • All personally identifiable information (PII) must be masked in the test environment.

  • Access to the environment must be granted on a “need-to-know” basis.

  • Test environments must not allow public access.

There were other requirements, but those were the four major ones. Unfortunately, the way MediaCo was fulfilling those requirements was creating huge inefficiencies. More than a dozen development teams were sitting idle for three to five days out of every month, waiting for the refresh to be executed. That is an extraordinary amount of waste and lost productivity. MediaCo was preparing to duplicate that same process in the cloud, which would have created zero value for its customers, the developers. (I’ll talk more about why you should view the developers as your customers in Part II of this book.)

Here’s what wasn’t on that list of requirements: that the two system administrators who ran the test environment had to be the ones to do the refresh. There was no reason they couldn’t create the code to automate the tasks, then allow the development teams to run the refresh themselves once they obtained the proper approvals.

It took a while to get everyone to look past their current processes and see that there was a better way. To make this point clear, we conducted a value stream mapping workshop that allowed them to see the bottlenecks in the existing process. We then worked with them to redesign the entire process in a way that was optimized for the cloud and for self provisioning.

The result was that development teams no longer had to work their sprint plans around an artificial refresh date each month, and they no longer lost days of development and testing time. The system administrators were no longer a bottleneck; they didn’t have to work 80 hours a week trying to keep up with all of the requests. And MediaCo could now leverage the cloud to spin up environments on demand and turn them off when not in use (off hours, weekends, holidays, etc.), which ended up saving the company over a million dollars.

This was a tremendous success for MediaCo, and it opened the doors to more opportunities in the public cloud. Had MediaCo not redesigned its existing processes, its implementation in the cloud may have cost even more than what it had in its datacenter. Another win for the company was that the new design improved morale for the two system administrators, who were no longer overworked and underappreciated; the developers, who were more productive; and the product owners, who were now getting their new features on time.

As we move to the cloud and embrace a more iterative approach to software development, many of our former “best practices” are now nothing more than process bottlenecks. In an age where speed to market is a competitive advantage, we need to take a step back and rethink our value streams.

Once again, we need to focus on what the requirements are that drove the existing processes, not the actual process implementation. Focus on the why, not the how. Why do we have so many review meetings? The answer is that we need to enforce coding standards and architecture best practices. We need to access risks from a security, compliance, and impact standpoint. We need to ensure that proper performance and regression testing are performed. The list goes on.

Deployment Processes

A common problem I see play out at many companies is they take their existing deployment “best practices” with them to the cloud. Often, the deployment processes were originally designed in an era when large monolithic applications were deployed on physical infrastructure biannually or quarterly. The process often consists of several manual steps, including numerous review meetings, approvals, forms to fill out, and checklists.

A better way to approach this is to understand what the required policies, controls, and standards are and automate the review process by adding tools like code scans into the CI/CD process. CI/CD, as you’ll recall from Chapter 3, allows us to automate the software deployment process from end to end, with no human intervention. These tools are configured to look for coding standards, security policies, and audit controls, and can fail the build process if a certain threshold of security, quality, and compliance is not achieved. The build process can also be configured to automatically update any enterprise asset management system or configuration management database (CMDB).

The power of CI/CD lies in automation. If you are allowed to rethink how your company delivers software, you can get extremely creative on how to leverage CI/CD to streamline your IT processes. CI/CD pipelines can eliminate a lot of manual reviews and checkpoints that are primarily in place because of a lack of trust.

Why this lack of trust in automation? One of the main reasons is that deployment is traditionally full of manual, nonrepeatable processes. People have been burned so often by botched deployments that they have inserted numerous process steps to try to add more trust in the deployment process. With a good CI/CD pipeline that is fully automated, auditable, and repeatable, we can start trusting our automation and removing some of the process obstacles that prevent us from delivering faster.

A few of the aspects of pipelines that can be fully automated are:

  • Builds

  • Code scans for enforcing best practices, including:

    • Coding standards

    • Cloud architecture best practices

    • Security policy enforcement

    • Compliance controls enforcement

  • Creating a cloud environment with mandated guardrails for security and compliance

  • Updating the configuration management database for asset tracking

  • Creating metadata and tagging for infrastructure

  • Code deployment and rollback, as needed

These automated processes can supply all of the necessary documentation required for auditing. But automation giveth, and automation taketh away: it gives us a standard and repeatable process that can increase quality, security hygiene, compliance, and delivery velocity—but takes away the tasks, committees, meetings, checklists, and other manual things that are part of people’s jobs. What often happens is the team that builds out the CI/CD pipeline only automates the steps that are in their control; they still have to “prove” the deployment is ready by attending meetings and filling out forms and checklists.

A large financial company I’ll call Prosperity embraced the concepts of CI/CD within its development process and found that its team could perform a build and create the supporting infrastructure in minutes with the push of a button. The problem was, the rest of the company was not ready to embrace this level of automation because they didn’t trust it.

In response, Prosperity decided to perform a value stream mapping (VSM) exercise. In a VSM exercise, you interview all of the stakeholders and participants in a particular unit of work so that you can visualize all of the steps required to complete it. This makes it easier to find lag time or waste. Once you identify the problems, the goal is to redesign the process to optimize the flow and reduce waste.

The unit of work Prosperity analyzed was “build a brand-new application.” Prosperity interviewed numerous stakeholders and mapped out the current state process, which filled one entire wall of a conference room. What it revealed was that, before a developer could launch that lightning-fast CI/CD process, they had to navigate 60 to 90 days of manual steps: tickets, emails, and verbal communication for approvals, access requests, account creation, accounting code creations, and so on. Much of that 60 to 90 days was wait time: the process stalled as everyone waited for the next step to be completed.

After that initial setup and approval stage, the development team could create and build the new software quickly. They were allowed to use CI/CD to push the code to a nonproduction environment. The CI/CD process included automated tests, code scans for standards and best practices, and code scans for security policies. If the code did not pass the requirements for quality, standards, and security, the build would fail. If it did pass, the code was put in staging and ready for production.

The VSM showed, however, that after this fast and efficient stage, there was another 60 to 90 days of process focusing on reviews and approvals—even though the automation had already proved that the code met all of the requirements to deploy to production. In other words, regardless of the product or feature requirements and no matter how good their CI/CD process was, any change at all would take four to six months!

Prosperity’s use case reveals why it is so important to look at how technology changes affect people and processes. Most manual review gates and approvals can be automated and result in better security, compliance, and quality than the manual review process. At the same time, the time to market can be drastically improved, providing more business value to customers.

Incident Management

Incident management for software in the public cloud must be redesigned, because the underlying infrastructure, from the hypervisor down, is now the responsibility of the CSPs. (Refer to the shared responsibility model from Figure 1-2.) Your responsibility is to monitor the CSP and work with them to resolve any issues that stem from the infrastructure layer.

You can see in Figure 4-4 that the cloud platform team is responsible for the SLAs of the cloud platform, and the developers are responsible for the operations of their applications built on top of the platform. (In Chapter 5 and 6, I’ll look more closely at cloud platforms as an internal CSP that delivers cloud services to the development teams.) If you don’t optimize your incident management process for the cloud, you could run into problems.

The platform’s role as an internal cloud provider within the shared responsibility model
Figure 4-4. The platform’s role as an internal cloud provider within the shared responsibility model

I worked with a Fortune 100 company (we’ll call it MegaCorp) that was an early adopter of both cloud computing and DevOps. The infrastructure team built a cloud platform on top of a private cloud and worked with the security, compliance, and development teams to embed policies and controls in the cloud platform while adding scanning capabilities to the CI/CD process. They did a great job of rethinking and automating their provisioning and deployment processes—but they did not redesign any of their operational processes.

MegaCorp’s cloud journey started out successfully. One of the largest companies on the planet was now able to take a new business requirement and deploy it to production in a matter of days. The security and compliance teams trusted the automation because they knew that the build would fail if it did not meet the stringent requirements of their heavily regulated industry. The feedback was glowing. Business partners who were used to waiting months to get new features were now seeing daily releases of mobile and web capabilities. So far, so good!

It was nirvana—until the first incident. Then things got ugly. The incident management process was still working with a legacy design, in which all support and operations were centrally managed within domain silos. The help desk provided tier 1 support. There were ops teams for networking, databases, infrastructure, and so on. When an incident occurred, a ticket was created and routed to tier 1 support—but they were only capable of solving basic problems like password resets, so they would route the ticket to the development team.

The development team had no access to servers or logs, so when a ticket came in, they’d create a new ticket to get log data for the timeframe in which the event had occurred. This took several hours. Once they had a snapshot of the logs, they could start solving the mystery. If someone had a hunch that there was something wrong with the database performance, they’d open up yet another ticket and route it to the database team. And then they’d wait. Eventually the database team would come back and recommend that the network team look into the issue. Another ticket was created and more wait time ensued. Incidents bounced around the organization like a hot potato while business users waited impatiently. After a couple of weeks of long resolution times, the excitement went away and business partners started asking to go back to the old model.

This was not a problem with the cloud; this was an ineffective process. We helped the client implement a logging and monitoring framework into their cloud platform that gave the developers self-service access to logs and application performance monitoring tools, without having to log on to servers. After fine-tuning the new incident management process, the product team quickly got their mean time to repair back to an acceptable level and eventually won back the trust of their customers.

The biggest lesson I learned on this project was that, even though the client did everything right by redesigning the provisioning and deployment processes, leveraging their existing operations processes almost killed their cloud project. Had they chosen to pilot with a mission-critical application, they might not have recovered so easily—or ever.

Security Processes

Security teams spend years mitigating risks and monitoring for threats to protect their employers’ crown jewels. But the tooling and processes that work well to secure a datacenter don’t translate well to securing cloud services. The security policies and requirements are still valid, but implementing them in the cloud is very different.

I worked with a large enterprise in a highly regulated industry; we’ll call it Nightingale Health. Before this company’s architects could migrate any applications to the cloud, they had to prove to the CISO that the cloud platform they were building was at least as secure as their existing datacenter. Nightingale had recently completed a network segmentation project on-prem and the network team was demanding that the cloud platform team leverage the exact same design for network segmentation in the public cloud. This would be much harder than leveraging the CSP’s cloud native security APIs. For example, on AWS, platform teams use a virtual private cloud (VPC) with public and private subnets to accomplish the equivalent of network segmentation on-prem. Unfortunately, Nightingale’s network team was insisting on a more complex design, including buying software appliances to be installed on the public cloud.

The platform architects did not feel empowered to challenge the network architects’ decisions. They tried their best to architect a public cloud solution that closely mimicked the on-prem implementation. When I challenged the platform team’s design, one architect responded, “I know it’s the wrong thing to do, but we don’t have a choice.”

I asked the network team for their requirements, but they sent me a document that looked more like a design, complete with solutions and vendor names. I was asking them about the “why,” but they were giving me the “how.” Eventually, we broke through and got to the real requirements that drove their network segmentation strategy. Then my team proposed a cloud native design to meet those requirements. Eventually that design was accepted and implemented, which saved Nightingale from implementing a very expensive, complex, and inefficient solution.

Security teams, like other teams, need to separate requirements from implementation—the “why” from the “how.” Then they need to be open to satisfying their requirements in new ways that are more optimal for the cloud.

The entire approach to security should change in the cloud. Many security policies can be baked into the underlying infrastructure. Others can be enforced through code scans in the CI/CD process. Continuous security monitoring can raise alerts immediately when someone isn’t adhering to security policies, so that quick action can minimize the exposure window. By now you’re familiar with the term infrastructure as code, but you should also be thinking of security as code. The more you automate and continuously monitor your security policies, the less you’ll need to deal with forms, meetings, and manual approvals, so you can bring products, features, and fixes to market much faster.

The most important advice I can give about security processes is this: people who design and mandate security-related processes should first pass a cloud certification on their cloud provider of choice, or they should already have extensive experience on a successful cloud transformation. If neither of these is true, the odds of their processes being well suited for building, deploying, and running software in the cloud are very low. People with no cloud experience and without enough training to pass a certification often tend to think of the cloud as “somebody else’s datacenter.” When people say that phrase, which I have heard many times, there is a good chance that they don’t understand the differences between software architected for the cloud versus software architected to run on physical infrastructure in a datacenter. In their mind, it’s all the same except for the location of the “datacenter.” If there is no difference, why change the way we deliver software when we go to the cloud?

Value Stream Mapping

Large enterprises embark on their cloud journey with decades of baggage. They know that their legacy processes are less than ideal (for the cloud and in general), but most have never actually visualized the entire end-to-end process for building, deploying, and operating software, so they don’t know how unproductive their processes really are.

This is where value stream mapping shines. Value stream mapping (VSM), as I’ve mentioned briefly a few times, is a pragmatic approach for visualizing all of the tasks of any process designed to create value, from beginning to end. It’s a method that arose from the Lean management movement in the 1990s and has gained popularity ever since. In the software world, value is typically delivered as a service: a new capability, product, or asset. The scope of a value stream can be as simple as the help desk process for resetting a password or as complex as upgrading all laptops to the next version of Windows. For the purposes of this discussion, I will focus on value streams related to delivering infrastructure and software services.

Applying VSM best practices helps all stakeholders and participants in the process identify waste and inefficiencies in how work gets done. This is valuable information that can be used to redesign the processes to be faster and more reliable, to create better value, and to raise morale across the enterprise. In my experience, companies that don’t employ process-reengineering techniques like VSM as they move to the cloud often underperform. It’s hard to redesign your processes when you don’t know exactly what they are.

Consultants and authors Karen Martin and Mike Osterling remind us that VSM is more than a tool; it’s “a methodology to transform leadership thinking, define strategy and priorities, and assure that customers are receiving high levels of value (versus focusing merely on reducing operational waste).2 The message here is clear: value stream mapping is an integral part of DevOps because it transforms the way our culture thinks about delivering software.

One of the main goals of VSM is to make all work visible. This is important because much of what we define as “waste” within the process may not be visible to all stakeholders. If nobody knows about a wasteful process, the odds of it getting optimized are slim to none. Kanban flow expert Dominica Degrandis, in her book Making Work Visible (IT Revolution Press), highlights five “thieves of time” that create waste in work processes: too much work in process, unknown dependencies, unplanned work, conflicting priorities, and neglected work. There’s a large and long-standing body of literature on waste in work processes, with many different tools and methodologies. VSM is well suited for the software delivery process and is a popular choice among DevOps practitioners. While I am not a trained expert on the topic, I have a great deal of practical experience using it. What follows is a high-level overview to whet your appetite. There are many great books on VSM, a few of which I’ve quoted here, and I recommend reading at least one before jumping in.

The VSM Methodology

Value stream mapping is a way to visually represent a process from a customer’s point of view. The customer’s point of view is their perception of how the overall process delivers value—and perception is reality. It’s easy for a process owner to design a process to satisfy their needs, but this often comes at the expense of the needs of the customer. In fact, many times the process owners are so far removed that they don’t even know what the customer’s experience is.

There are two ways to visualize a value stream, walking the process and holding workshops, and I recommend you use both:

Walk the process
The first is called “walking the process.” I like to call it spending a day in the customer’s shoes. The analyst watches the customer participate in the process throughout the day and records what they witness. There are pros and cons to this method. It can take a long time to observe enough people to come to sound conclusions, and in some cases there is observer bias, where the act of watching disrupts the flow of work. The benefit is that the eyes don’t lie: you often witness steps that are invisible to the process owners.
Workshops
A value stream mapping workshop gets everyone involved in the process together to document all of the steps involved. For each step, the workshop moderators look for variations in a process, items that block the flow of work, waste, and steps that don’t add value.

From this information, the analyst can document the end-to-end process and identify opportunities for improvement. Figure 4-5 shows an example of the map created during a VSM workshop.

Figure 4-6 shows a redesigned process that drastically reduces the overall lead time of the value stream.

A value stream map
Figure 4-5. A value stream map
Redesigned process after the VSM workshop
Figure 4-6. Redesigned process after the VSM workshop

Holding a VSM Workshop

While I strongly recommend researching the VSM method yourself before beginning, I’ll outline the basic steps of the process here:

Step 1: Choose your event method(s)
Decide whether you are going to hold an observation event or a workshop. In a perfect world, you would do both. The workshop is where you can collect the most information; observation can supplement that data with real-world execution of the process.
Step 2: Define your scope
Pick the scope of the process you want to map. For example, a company that’s recently moved to the cloud might struggle with resolving incidents, which leads to an increase in mean time to repair. That company might want its workshop to focus on the incident management value stream.
Step 3: Plan the event
Schedule the event. This includes identifying all of the process stakeholders and finding a time where they can all participate in the workshop, in person or virtually. If key stakeholders cannot attend the event, it is better to postpone it than to risk not collecting all of the pertinent tasks and data points, which could skew your value analysis.
Step 4: Hold the VSM workshop
Perform the workshop and/or observation event. Workshops are often half-day or full-day events, depending on the size and scope of the value streams. In some cases, you might need multiple workshops to accommodate people in far-flung locations, but it is highly preferred that all stakeholders participate in the same session and hear the same information at the same time.
Step 5: Validate your map
After the workshop, the analyst documents the process map and shows it to stakeholders to validate that the data was correctly captured. This can be done with another scheduled meeting with key stakeholders, in person or online.
Step 6: Analyze your map
The analyst applies the VSM methodology to the map and highlights the problem areas, such as invisible work, bottlenecks, and waste. The analyst might design the future state process in this step or schedule another workshop with stakeholders to collaborate on the new process design.
Step 7: Report your findings
The analyst reports the findings to the key stakeholders. This includes collecting all action items and identifying next steps.

Once you’ve completed the VSM process, you’ll need to design the future state process, plan how to implement it, and execute on your plan.

I have seen VSM workshops take a month-long process down to days or even hours. It is critical to record key productivity metrics from your findings, so that once you’ve implemented the new process, you’ll have concrete numbers to show the difference. Many workplace cultures are resistant to changing their processes. It’s powerful to be able to show that the change reduced processing time by 23 days or reduced the error rate by 25% or improved the company’s Net Promoter score by 10%. Factual statements like these can drive more change throughout the organization.

Conclusion: A Golden Opportunity

When incidents happen in the datacenter, no matter how good or bad the existing processes are, people know what they are and how to restore services. Even if the process is totally inefficient and requires 50 people to get on a call at 4 a.m., those teams are equipped with history, institutional knowledge, and procedural awareness.

When you move to the cloud for the first time, you are moving to a greenfield virtual datacenter. There are no processes in place. This is a one-time opportunity to design processes from the ground up, optimized for the cloud and its new ways of thinking. You’ll never have a better chance to get this right. Don’t simply bring your legacy processes and mindsets along for the ride. Your company’s needs—all those security policies, compliance controls, and operational requirements—are still valid; it’s just how you satisfy them that needs to change.

It’s critical to redesign processes at all levels of the company for the cloud. To summarize, follow these guidelines as you look for opportunities for improvement:

  • Focus first on the requirements or goals of the service (the “why,” not the “how”).

  • Look for opportunities to remove waste from the existing process.

  • Redesign your process with the shared responsibility model in mind.

  • Automate as much as possible.

  • Build trust into the system through automation and continuous monitoring.

  • Move review processes from preproduction to postmortem.

  • Continuously reevaluate and improve processes over time.

Process change is a key component of cloud adoption. Failing to acknowledge that legacy processes designed for another era aren’t the best way to deliver software in the cloud will most likely result in low performance. This error in judgement will compound as more workloads move to the cloud, which can result in catastrophic consequences, such as risk exposure, missed SLAs, and cost overruns. Transforming a culture to be more DevOps-centric starts with good process hygiene. Pick a process pain point and optimize it for the cloud. All the technology in the world can’t fix bad processes.

1 I am neither endorsing nor condoning ITIL, which is a library of best practices for managing IT services and improving IT support and service levels. One of its main goals is to ensure that IT services align with business objectives, even as business objectives change.

2 Karen Martin and Mike Osterling, Value Stream Mapping: How to Visualize Work and Align Leadership for Organizational Transformation (McGraw-Hill).

Get Accelerating Cloud Adoption now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.