Chapter 7

Designing for Resiliency and Scalability


  • Achieving resiliency with fault domains and upgrade domains
  • Designing for continuity
  • Planning for disaster recovery
  • Understanding hybrid cloud scalability

To create a hybrid cloud resilient architecture, it’s very important first to understand the nature and cause of all failure points for an application that can cause an outage. This has always been true, by the way, for a traditional on-premises architecture. It is also true for private or public cloud architecture, but its importance for a hybrid cloud architecture is immensely multiplied by the large number of potential failures and modes inevitably present in every hybrid cloud architecture.

NOTE Understanding the failure points and failure modes for a hybrid cloud architecture and its related workload services enables you to make informed, targeted decisions regarding strategies for resiliency and availability.

A failure point can mean many things — such as network, power, Internet connectivity, and so on — but in the architecture design, a failure point often means a design element that can cause an outage. Examples of design elements that can cause an outage in a hybrid cloud architecture include the following:

  • DNS name resolution (especially when a DNS server is used to resolve VM names hosted in dispersed network environments, i.e., private and public clouds)
  • Database connections
  • Website connections
  • Web service connections
  • External interfaces connectivity ...

Get Windows Azure Hybrid Cloud now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.