« Continued from The Rise of Cloud-Native

Changes Needed

All we are doing is looking at the timeline from the moment a customer gives us an order to the point when we collect the cash. And we are reducing that timeline by removing the nonvalue-added wastes.

Taichi Ohno

Taichi Ohno is widely recognized as the Father of Lean Manufacturing. Although the practices of lean manufacturing often don’t translate perfectly into the world of software development, the principles normally do. These principles can guide us well in seeking out the changes necessary for a typical enterprise IT organization to adopt cloud-native application architectures, and to embrace the cultural and organizational transformations that are part of this shift.

Cultural Change

A great deal of the changes necessary for enterprise IT shops to adopt cloud-native architectures will not be technical at all. They will be cultural and organizational changes that revolve around eliminating structures, processes, and activities that create waste. In this section we’ll examine the necessary cultural shifts.

From Silos to DevOps

Enterprise IT has typically been organized into many of the following silos:

  • Software development

  • Quality assurance

  • Database administration

  • System administration

  • IT operations

  • Release management

  • Project management

These silos were created in order to allow those that understand a given specialty to manage and direct those that perform the work of that specialty. These silos often have different management hierarchies, toolsets, communication styles, vocabularies, and incentive structures. These differences inspire very different paradigms of the purpose of enterprise IT and how that purpose should be accomplished.

An often cited example of these conflicting paradigms is the view of change possessed by the development and operations organizations. Development’s mission is usually viewed as delivering additional value to the organization through the development of software features. These features, by their very nature, introduce change into the IT ecosystem. So development’s mission can be described as “delivering change,” and is very often incentivized around how much change it delivers.

Conversely, IT operations’ mission can be described as that of “preventing change.” How? IT operations is usually tasked with maintaining the desired levels of availability, resiliency, performance, and durability of IT systems. Therefore they are very often incentivized to maintain key perfomance indicators (KPIs) such as mean time between failures (MTBF) and mean time to recovery (MTTR). One of the primary risk factors associated with any of these measures is the introduction of any type of change into the system. So, rather than find ways to safely introduce development’s desired changes into the IT ecosystem, the knee-jerk reaction is often to put processes in place that make change painful, and thereby reduce the rate of change.

These differing paradigms obviously lead to many additional suboptimal collaborations. Collaboration, communication, and simple handoff of work product becomes tedious and painful at best, and absolutely chaotic (even dangerous) at worst. Enterprise IT often tries to “fix” the situation by creating heavyweight processes driven by ticket-based systems and committee meetings. And the enterprise IT value stream slows to a crawl under the weight of all of the nonvalue-adding waste.

Environments like these are diametrically opposed to the cloud-native idea of speed. Specialized silos and process are often motivated by the desire to create a safe environment. However they usually offer very little additional safety, and in some cases, make things worse!

At its heart, DevOps represents the idea of tearing down these silos and building shared toolsets, vocabularies, and communication structures in service of a culture focused on a single goal: delivering value rapidly and safely. Incentive structures are then created that reinforce and award behaviors that lead the organization in the direction of that goal. Bureaucracy and process are replaced by trust and accountability.

In this new world, development and IT operations report to the same immediate leadership and collaborate to find practices that support both the continuous delivery of value and the desired levels of availability, resiliency, performance, and durability. Today these context-sensitive practices increasingly include the adoption of cloud-native application architectures that provide the technological support needed to accomplish the organization’s new shared goals.

From Punctuated Equilibrium to Continuous Delivery

Enterprises have often adopted agile processes such as Scrum, but only as local optimizations within development teams.

As an industry we’ve actually become fairly successful in transitioning individual development teams to a more agile way of working. We can begin projects with an inception, write user stories, and carry out all the routines of agile development such as iteration planning meetings, daily standups, retrospectives, and customer showcase demos. The adventurous among us might even venture into engineering practices like pair programming and test-driven development. Continuous integration, which used to be a fairly radical concept, has now become a standard part of the enterprise software lexicon. In fact, I’ve been a part of several enterprise software teams that have established highly optimized “story to demo” cycles, with the result of each development iteration being enthusiastically accepted during a customer demo.

But then these teams would receive that dreaded question:

When can we see these features in our production environment?

This question is the most difficult for us to answer, as it forces us to consider forces that are beyond our control:

  • How long will it take for us to navigate the independent quality assurance process?

  • When will we be able to join a production release train?

  • Can we get IT operations to provision a production environment for us in time?

It’s at this point that we realize we’re embedded in what Dave West has called the waterscrumfall. Our team has moved on to embrace agile principles, but our organization has not. So, rather than each iteration resulting in a production deployment (this was the original intent behind the Agile Manifesto value of working software), the code is actually batched up to participate in a more traditional downstream release cycle.

This operating style has direct consequences. Rather than each iteration resulting in value delivered to the customer and valuable feedback pouring back into the development team, we continue a “punctuated equilibrium” style of delivery. Punctuated equilibrium actually short-circuits two of the key benefits of agile delivery:

  • Customers will likely go several weeks without seeing new value in the software. They perceive that this new agile way of working is just “business as usual,” and do not develop the promised increased trust relationship with the development team. Because they don’t see a reliable delivery cadence, they revert to their old practices of piling as many requirements as possible into releases. Why? Because they have little confidence that any software delivery will happen soon, they want as much value as possible to be included when it finally does occur.

  • Teams may go several weeks without real feedback. Demos are great, but any seasoned developer knows that the best feedback comes only after real users engage with production software. That feedback provides valuable course corrections that enable teams to “build the right thing.” By delaying this feedback, the likelihood that the wrong thing gets built only increases, along with the associated costly rework.

Gaining the benefits of cloud-native application architectures requires a shift to continuous delivery. Rather than punctuated equilibrium driven by a waterscrumfall organization, we embrace the principles of value from end to end. A useful model for envisioning such a lifecycle is the idea of “Concept to Cash” described by Mary and Tom Poppendieck in their book Implementing Lean Software Development (Addison-Wesley). This approach considers all of the activities necessary to carry a business idea from its conception to the point where it generates profit, and constructs a value stream aligning people and process toward the optimal achievement of that goal.

We technically support this way of working with the engineering practices of continuous delivery, where every iteration (in fact, every source code commit!) is proven to be deployable in an automated fashion. We construct deployment pipelines which automate every test which would prevent a production deployment should that test fail. The only remaining decision to make is a business decision: does it make good business sense to deploy the available new features now? We already know they work as advertised, so do we want to give them to our customers? And because the deployment pipeline is fully automated, the business is able to act on that decision with the click of a button.

Centralized Governance to Decentralized Autonomy

One portion of the waterscrumfall culture merits a special mention, as I have seen it become a real sticking point in cloud-native adoption.

Enterprises normally adopt centralized governance structures around application architecture and data management, with committees responsible for maintaining guidelines and standards, as well as approving individual designs and changes. Centralized governance is intended to help with a few issues:

  • It can prevent widespread inconsistencies in technology stacks, decreasing the overall maintenance burden for the organization.

  • It can prevent widespread inconsistencies in architectural choices, allowing for a common view of application development across the organization.

  • Cross-cutting concerns like regulatory compliance can be handled in a consistent way for the entire organization.

  • Ownership of data can be determined by those who have a broad view of all organizational concerns.

These structures are created with the belief that they will result in higher quality, lower costs, or both. However, these structures rarely result in the quality improvements or cost savings desired, and further prevent the speed of delivery sought from cloud-native application architectures. Just as monolithic application architectures can create bottlenecks which limit the speed of technical innovation, monolithic governance structures can do the same. Architectural committees often only assemble periodically, and long waiting queues of work often ensue. Even small data model changes—changes that could be implemented in minutes or hours, and that would be readily approved by the committee—lay wasting in an ever-growing stack of to-do items.

Adoption of cloud-native application architectures is almost always coupled with a move to decentralized governance. The teams building cloud-native applications (Business Capability Teams) own all facets of the capability they’re charged with delivering. They own and govern the data, the technology stack, the application architecture, the design of individual components, and the API contract delivered to the remainder of the organization. If a decision needs to be made, it’s made and executed upon autonomously by the team.

The decentralization and autonomy of individual teams is balanced by minimal, lightweight structures that are imposed on the integration patterns used between independently developed and deployed services (e.g., they prefer HTTP REST JSON APIs rather than many different styles of RPC). These structures often emerge through grassroots adoption of solutions to cross-cutting problems like fault tolerance. Teams are encouraged to devise solutions to these problems locally, and then self-organize with other teams to establish common patterns and frameworks. As a preferred solution for the entire organization emerges, ownership of that solution is very often transfered to a cloud frameworks/tools team, which may or may not be embedded in the platform operations team (The Platform Operations Team). This cloud frameworks/tools team will often pioneer solutions as well while the organization is reforming around a shared understanding of the architecture.

Organizational Change

In this section we’ll examine the necessary changes to how organizations create teams when adopting cloud-native application architectures. The theory behind this reorganization is the famous observation known as Conway’s Law. Our solution is to create a team combining staff with many disciplines around each long-term product, instead of segregating staff that have a single discipline in each own team, such as testing.

Business Capability Teams

Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.

Melvyn Conway

We’ve already discussed in From Silos to DevOps the practice of organizing IT into specialized silos. Quite naturally, having created these silos, we have also placed individuals into teams aligned with these silos. But what happens when we need to build a new piece of software?

A very common practice is to commission a project team. The team is assigned a project manager, and the project manager then collaborates with various silos to obtain “resources” for each specialty needed to staff the project. Part of what we learn from Conway’s Law, quoted above, is that these teams will then very naturally produce in their system design the very silos from which they hail. And so we end up with siloed architectures having modules aligned with the silos themselves:

  • Data access tier

  • Services tier

  • Web MVC tier

  • Messaging tier

  • Etc.

Each of these tiers spans multiple identifiable business capabilities, making it very difficult to innovate and deploy features related to one business capability independently from the others.

Companies seeking to move to cloud-native architectures like microservices segregated by business capability have often employed what Thoughtworks has called the Inverse Conway Maneuver. Rather than building an architecture that matches their org chart, they determine the architecture they want and restructure their organization to match that architecture. If you do that, according to Conway, the architecture that you desire will eventually emerge.

So, as part of the shift to a DevOps culture, teams are organized as cross-functional, business capability teams that develop products rather than projects. Products are long-lived efforts that continue until they no longer provide value to the business. (You’re done when your code is no longer in production!) All of the roles necessary to build, test, deliver, and operate the service delivering a business capability are present on a team, which doesn’t hand off code to other parts of the organization. These teams are often organized as "two-pizza teams", meaning that the team is too big if it cannot be fed with two pizzas.

What remains then is to determine what teams to create. If we follow the Inverse Conway Maneuver, we’ll start with the domain model for the organization, and seek to identify business capabilities that can be encapsulated within bounded contexts (which we’ll cover in Decomposing Data). Once we identify these capabilities, we create business capability teams to own them throughout their useful lifecycle. Business capability teams own the entire development-to-operations lifecycle for their applications.

The Platform Operations Team

The business capability teams need to rely on the self-service agile infrastructure described earlier in Self-Service Agile Infrastructure. In fact, we can express a special business capability defined as “the ability to develop, deploy, and operate business capabilities.” This capability is owned by the platform operations team.

The platform operations team operates the self-service agile infrastructure platform leveraged by the business capability teams. This team typically includes the traditional system, network, and storage administrator roles. If the company is operating the cloud platform on premises, this team also either owns or collaborates closely with teams managing the data centers themselves, and understands the hardware capabilities necessary to provide an infrastructure platform.

IT operations has traditionally interacted with its customers via a variety of ticket-based systems. Because the platform operations team operates a self-service platform, it must interact differently. Just as the business capability teams collaborate with one another around defined API contracts, the platform operations team presents an API contract for the platform. Rather than queuing up requests for application environments and data services to be provisioned, business capability teams are able to take the leaner approach of building automated release pipelines that provision environments and services on-demand.

Technical Change

Now we can turn to some implementation issues in moving to a DevOps platform in the cloud.

Decomposing Monoliths

Traditional n-tier, monolithic enterprise applications rarely operate well when deployed to cloud infrastructure, as they often make unsupportable assumptions about their deployment environment that cloud infrastructures simply cannot provide. A few examples include:

  • Access to mounted, shared filesystems

  • Peer-to-peer application server clustering

  • Shared libraries

  • Configuration files sitting in well-known locations

Most of these assumptions are coupled with the fact that monoliths are typically deployed to long-lived infrastructure. Unfortunately, they are not very compatible with the idea of elastic and ephemeral infrastructure.

But let’s assume that we could build a monolith that does not make any of these assumptions. We still have trouble:

  • Monoliths couple change cycles together such that independent business capabilities cannot be deployed as required, preventing speed of innovation.

  • Services embedded in monoliths cannot be scaled independently of other services, so load is far more difficult to account for efficiently.

  • Developers new to the organization must acclimate to a new team, often learn a new business domain, and become familiar with an extremely large codebase all at once. This only adds to the typical 3–6 month ramp up time before achieving real productivity.

  • Attempting to scale the development organization by adding more people further crowds the sandbox, adding expensive coordination and communication overhead.

  • Technical stacks are committed to for the long term. Introducing new technology is considered too risky, as it can adversely affect the entire monolith.

The observant reader will notice that this list is the inverse of the list from Microservices. The decomposition of the organization into business capability teams also requires that we decompose applications into microservices. Only then can we harness the maximum benefit from our move to cloud infrastructure.

Decomposing Data

It’s not enough to decompose monolithic applications into microservices. Data models must also be decoupled. If business capability teams are supposedly autonomous but are forced to collaborate via a single data store, the monolithic barrier to innovation is simply relocated.

In fact, it’s arguable that product architecture must start with the data. The principles found in Domain-Driven Design (DDD), by Eric Evans (Addison-Wesley), argue that our success is largely governed by the quality of our domain model (and the ubiquitous language that underpins it). For a domain model to be effective, it must also be internally consistent—we should not find terms or concepts with inconsistent definitions within the same model.

It is incredibly difficult and costly (and arguably impossible) to create a federated domain model that does not suffer from such inconsistencies. Evans refers to internally consistent subsets of the overall domain model of the business as bounded contexts.

When working with an airline customer recently, we were discussing the concepts most central to their business. Naturally the topic of “airline reservation” came up. The group could count seventeen different logical definitions of reservation within its business, with little to no hope of reconciling them into one. Instead, all of the nuance of each definition was carefully baked into a single concept, which became a huge bottleneck for the organization.

Bounded contexts allow you to keep inconsistent definitions of a single concept across the organization, as long as they are defined consistently within the contexts themselves.

So we begin by identifying the segments of the domain model that can be made internally consistent. We draw fixed boundaries around these segments, which become our bounded contexts. We’re then able to align business capability teams with these contexts, and those teams build microservices providing those capabilities.

This definition of microservice provides a useful rubric for defining what your twelve-factor apps ought to be. Twelve-factor is primarily a technical specification, whereas microservices are primarily a business specification. We define our bounded contexts, assign them a set of business capabilities, commission capability teams to own those business capabilities, and have them build twelve-factor applications. The fact that these applications are independently deployable provides business capability teams with a useful set of technical tools for operation.

We couple bounded contexts with the database per service pattern, where each microservice encapsulates, governs, and protects its own domain model and persistent store. In the database per service pattern, only one application service is allowed to gain access to a logical data store, which could exist as a single schema within a multitenant cluster or a dedicated physical database. Any external access to the concepts is made through a well-defined business contract implemented as an API (often REST but potentially any protocol).

This decomposition allows for the application of polyglot persistence, or choosing different data stores based on data shape and read/write access patterns. However, data must often be recomposed via event-driven techniques in order to ask cross-context questions. Techniques such as command query responsibility segregation (CQRS) and event sourcing, beyond the scope of this report, are often helpful when synchronizing similar concepts across contexts.

Containerization

Container images, such as those prepared via the LXC, Docker, or Rocket projects, are rapidly becoming the unit of deployment for cloud-native application architectures. Such container images are then instantiated by various scheduling solutions such as Kubernetes, Marathon, or Lattice. Public cloud providers such as Amazon and Google also provide first-class solutions for container scheduling and deployment. Containers leverage modern Linux kernel primitives such as control groups (cgroups) and namespaces to provide similar resource allocation and isolation features as those provided by virtual machines with much less overhead and much greater portability. Application developers will need to become comfortable packaging applications as container images to take full advantage of the features of modern cloud infrastructure.

From Orchestration to Choreography

Not only must service delivery, data modeling, and governance be decentralized, but also service integration. Enterprise integration of services has traditionally been accomplished via the enterprise service bus (ESB). The ESB becomes the owner of all routing, transformation, policy, security, and other decisions governing the interaction between services. We call this orchestration, analogous to the conductor who determines the course of the music performed by an orchestra during its performance. ESBs and orchestration make for very simple and pleasing architecture diagrams, but their simplicity is deceiving. Often hiding within the ESB is a tangled web of complexity. Managing this complexity becomes a full-time occupation, and working with it becomes a continual bottleneck for the application development team. Just as we saw with a federated data model, a federated integration solution like the ESB becomes a monolithic hazard to speed.

Cloud-native architectures, such as microservices, tend to prefer choreography, akin to dancers in a ballet. Rather than placing the smarts in the integration mechanism, they are placed in the endpoints, akin to the dumb pipes and smart filters of the Unix architecture. When circumstances on stage differ from the original plan, there’s no conductor present to tell the dancers what to do. Instead, they simply adapt. In the same way, services adapt to changing circumstances in their environment via patterns such as client-side load balancing (Routing and Load Balancing) and circuit breakers (Fault-Tolerance).

While the architecture diagrams tend to look like a tangled web, their complexity is no greater than a traditional SOA. Choreography simply acknowledges and exposes the essential complexity of the system. Once again this shift is in support of the autonomy required to enable the speed sought from cloud-native architectures. Teams are able to adapt to their changing circumstances without the difficult overhead of coordinating with other teams, and avoid the overhead of coordinating changes with a centrally-managed ESB.

Summary

In this chapter we’ve examined a few of the changes that most enterprises will need to make in order to adopt cloud-native application architectures. Culturally the overall theme is one of decentralization and autonomy:

DevOps

Decentralization of skill sets into cross-functional teams.

Continuous delivery

Decentralization of the release schedule and process.

Autonomy

Decentralization of decision making.

We codify this decentralization into two primary team structures:

Business capability teams

Cross-functional teams that make their own decisions about design, process, and release schedule.

Platform operations teams

Teams that provide the cross-functional teams with the platform they need to operate.

And technically, we also decentralize control:

Monoliths to microservices

Control of individual business capabilities is distributed to individual autonomous services.

Bounded contexts

Control of internally consistent subsets of the business domain model is distributed to microservices.

Containerization

Control of application packaging is distributed to business capability teams.

Choreography

Control of service integration is distributed to the service endpoints.

All of these changes create autonomous units that are able to safely move at the desired rate of innovation.

In the final chapter, we’ll delve into technical specifics of migrating to cloud-native application architectures through a set of cookbook-style recipes.

Article image: atelier bow-wow, sectional model of house tower, tokyo 2006 (source: SEIER+SEIER).