Chapter 2. The Always On Strategy

Simply put, Always On mission-critical services are applications that have been purposefully designed and deployed to fulfill the demanding requirement of zero tolerance for recovery time (i.e., applications that cannot afford the time needed to recover).

Contrary to other resiliency methods presented in Figure 2-1, such as disaster recovery and warm/standby, Always On aims to provide zero downtime as perceived and experienced by the end user. This means that the goal of achieving a zero downtime strategy is not simply a technical issue but a customer-centric one.

Recovery and zero downtime strategies
Figure 2-1. Recovery and zero downtime strategies

Cloud SLAs Are Often Misunderstood

Cloud service-level agreements (SLAs) address only a few aspects of the overall user experience. The confusion stems from the limited scope of SLAs provided by cloud providers, which usually only apply to the infrastructure and managed services provided “as a service.” And that simply isn’t enough. For example, and despite being designed with high availability and fault tolerance in mind, cloud Kubernetes platforms often experience failures, and enterprises have little to no control over how quickly the service will recover.

Reliability is impacted by multiple layers that all play a role in delivering a service. It is essential to understand the varying impact each layer has on the overall customer ...

Get Cloud Adoption for Mission-Critical Workloads now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.