Architecting for Scale, 2nd Edition

Book description

Every day, companies struggle to scale critical applications. As traffic volume and data demands increase, these applications become more complicated and brittle, exposing risks and compromising availability. With the popularity of software as a service, scaling has never been more important.

Updated with an expanded focus on modern architecture paradigms such as microservices and cloud computing, this practical guide provides techniques for building systems that can handle huge quantities of traffic, data, and demand—without affecting the quality your customers expect. Architects, managers, and directors in engineering and operations organizations will learn how to build applications at scale that run more smoothly and reliably to meet the needs of customers.

  • Learn how scaling affects the availability of your services, why that matters, and how to improve it
  • Dive into a modern service-based application architecture that ensures high availability and reduces the effects of service failures
  • Explore the Single Team Owned Service Architecture paradigm (STOSA)—a model for scaling your development organization in tandem with your application
  • Understand, measure, and mitigate risk in your systems
  • Use the cloud to build highly scalable applications

Publisher resources

View/Submit Errata

Table of contents

  1. Forewords
    1. Foreword for Second Edition
    2. Foreword for First Edition
  2. Preface
    1. Who Should Read This Book
    2. Why I Wrote This Book
    3. A Word on Scale Today
    4. What’s New in the Second Edition
    5. Using the Cloud
    6. Services Versus Microservices
    7. Modern Digital Customer Experiences
    8. Navigating This Book
      1. Tenet 1. Availability: Maintaining Availability in Modern Applications
      2. Tenet 2. Modern Application Architecture: Using Services
      3. Tenet 3. Organization: Scaling Your Organization for Modern Applications
      4. Tenet 4. Risk: Risk Management for Modern Applications
      5. Tenet 5. Cloud: Utilizing the Cloud
    9. Online Resources
    10. Conventions Used in This Book
    11. O’Reilly Online Learning
    12. How to Contact Us
    13. Acknowledgments
  3. I. Tenet 1. Availability: Maintaining Availability in Modern Applications
  4. 1. Understanding, Measuring, and Improving Your Availability
    1. Availability Versus Reliability
    2. What Causes Poor Availability?
    3. Measuring Availability
      1. The Nines
      2. Planned Outages Are Still Outages
      3. Availability by the Numbers
    4. Improving Your Availability When It Slips
      1. Measure and Track Your Current Availability
      2. Automate Your Manual Processes
      3. Improve Your Systems
      4. Keep on Top of Availability in Your Changing and Growing Application
    5. Five Focuses to Improve Application Availability
      1. Focus #1: Build with Failure in Mind
      2. Focus #2: Always Think About Scaling
      3. Focus #3: Mitigate Risk
      4. Focus #4: Monitor Availability
      5. Focus #5: Respond to Availability Issues in a Predictable and Defined Way
    6. Being Prepared
  5. 2. Two Mistakes High—Having Room to Recover from Mistakes
    1. Two Mistakes High
      1. Scenario #1: Losing a Node
      2. Scenario #2: Problems During Upgrades
      3. Scenario #3: Data Center Resiliency
      4. Scenario #4: Hidden Shared Failure Types
      5. Scenario #5: Failure Loops
    2. Managing Your Applications
    3. The Space Shuttle
  6. II. Tenet 2. Modern Application Architecture: Using Services
  7. 3. Using Services
    1. The Monolith Application Versus the Service-Based Application
      1. The Ownership Benefit
      2. The Scaling Benefit
    2. Splitting into Services
      1. What Should Be a Service?
    3. Dividing into Services
      1. Guideline #1: Specific Business Requirements
      2. Guideline #2: Distinct and Separable Team Ownership
      3. Guideline #3: Naturally Separable Data
      4. Guideline #4: Shared Capabilities/Data
      5. Mixed Reasons
    4. Going Too Far
    5. Finding the Right Balance
  8. 4. Services and Data
    1. Stateless Services—Services Without Data
    2. Stateful Services—Services with Data
    3. Data Partitioning
    4. Timely Handling of Growing Pains
  9. 5. Dealing with Service Failures
    1. Cascading Service Failures
    2. Responding to a Service Failure
      1. Predictable Response
      2. Understandable Response
      3. Reasonable Response
    3. Determining Failures
    4. Appropriate Action
      1. Graceful Degradation
      2. Graceful Backoff
      3. Fail as Early as Possible
      4. Customer-Caused Problems
    5. Summary
  10. III. Tenet 3. Organization: Scaling Your Organization for Modern Applications
  11. 6. Service Ownership—STOSA
    1. Single Team Owned Service Architecture
    2. Advantages of a STOSA Application and Organization
    3. What Does It Mean to “Own” a Service?
    4. Using Core Teams and Services
    5. Summary
  12. 7. Service Tiers
    1. Application Complexity
    2. What Are Service Tiers?
      1. Assigning Service Tier Labels to Services
    3. Example: Online Store
    4. Using Service Tiers
      1. Expectations
      2. Responsiveness
      3. Dependencies
    5. Summary
  13. 8. Service-Level Agreements
    1. What Are SLAs?
    2. External Versus Internal SLAs
      1. Why Are Internal SLAs Important?
    3. SLAs for Problem Diagnosis
    4. Performance Measurements for SLAs
      1. Limit SLAs
      2. Top Percentile SLAs
      3. SLA Conditionals
    5. How Many and Which Internal SLAs?
    6. Why Internal SLAs Are Important
  14. IV. Tenet 4. Risk: Risk Management for Modern Applications
  15. 9. Using Risk Management When Architecting for Scale
    1. Identify Risk
      1. Remove Worst Offenders
      2. Mitigate
      3. Review Regularly
      4. Managing Risk Summary
    2. Likelihood Versus Severity
      1. The Top 10 List: Low Likelihood, Low Severity Risk
      2. The Order Database: Low Likelihood, High Severity Risk
      3. Custom Fonts: High Likelihood, Low Severity Risk
      4. T-Shirt Photos: High Likelihood, High Severity Risk
    3. The Risk Matrix
      1. Scope of the Risk Matrix
      2. Creating the Risk Matrix
      3. Using the Risk Matrix for Planning
      4. Maintaining the Risk Matrix
    4. Risk Mitigation
    5. Recovery Plans
    6. Disaster Recovery Plans
    7. Improving Our Risk Situation
  16. 10. Game Days
    1. Staging Versus Production Environments
      1. Staging/Test Environments
      2. Production Environments
    2. Concerns with Running Game Days in Production
    3. Summary
  17. 11. Building Systems with Reduced Risk
    1. Technique #1: Introduce Redundancy
      1. Idempotent Interfaces
      2. Redundancy Improvements That Increase Complexity
    2. Technique #2: Understand Independence
    3. Technique #3: Manage Security
    4. Technique #4: Encourage Simplicity
    5. Technique #5: Build in Self-Repair
    6. Technique #6: Standardize on Operational Processes
    7. Summary
  18. V. Tenet 5. Cloud: Utilizing the Cloud
  19. 12. Getting Started Architecting for Scale with the Cloud
    1. Six Levels of Cloud Maturity
      1. Level 1: Experimenting with the Cloud
      2. Level 2: Securing the Cloud
      3. Level 3: Using Servers and Applications in the Cloud
      4. Level 4: Enabling Value-Added Managed Services
      5. Level 5: Enabling Cloud-Unique Services
      6. Level 6: Cloud All In
      7. Organization Versus Application Maturity Level
    2. Cloud Adoption Mistakes
      1. Trap #1: Not Trusting Cloud Security
      2. Trap #2: Performing Cloud Migration via Lift-and-Shift
      3. Trap #3: The Lure of Serverless—Depending Too Much on the Hype
    3. When and How to Use Multiple Clouds
      1. Defining What We Mean by Multiple Clouds
      2. Which Model? Which Cloud?
    4. The Cloud in Summary
  20. 13. Five Industry Trends Changed by the Cloud
    1. What Has Changed in the Cloud?
      1. Change #1: Acceptance of Microservice-Based Architectures
      2. Change #2: Smaller, More Specialized Cloud Services
      3. Change #3: Greater Focus on the Application
      4. Change #4: The Micro Startup
      5. Change #5: Security and Compliance Has Matured
    2. Change Continues
  21. 14. Types of SaaS and Tenancy
    1. Comparing Managed Hosting and Different Types of SaaS
      1. Managed Hosting
      2. Multi-Tenant SaaS
      3. Single-Tenant SaaS
    2. Mixing Different Types of SaaS
    3. Common SaaS Characteristics
    4. SaaS Versus Managed Hosting
    5. Summary
  22. 15. Distributing Your Application in the AWS Cloud
    1. AWS Architecture
      1. AWS Region
      2. AWS Availability Zone
      3. Data Center
    2. Architecture Overview
    3. Availability Zones Are Not Data Centers
    4. Maintaining Location Diversity for Availability Reasons
      1. AWS—Mapping Availability Zones in Multiple Accounts
    5. Distributing Your Application
  23. 16. Managed Infrastructure
    1. Structure of Cloud-Based Services
      1. Raw Resource
      2. Server-Based Managed Resource
      3. Serverless Managed Resource
    2. Implications of Using Managed Versus Non-Managed Resources
    3. Summary
  24. 17. Cloud Resource Allocation
    1. Usage-Based Resources Allocation
    2. Allocated-Capacity Resource Allocation
      1. Changing Allocations
      2. Automated Allocation of Resource Capacity
      3. Issues with Automatic Allocation
      4. Dynamic Allocation, Dynamic Cost
    3. Pros and Cons of Usage-Based Versus Allocated-Capacity
  25. 18. Serverless and Functions as a Service
    1. Example Application #1: Event Processing
    2. Example Application #2: Mobile Backend
    3. Example Application #3: Internet of Things Data Intake
    4. Advantages and Disadvantages of FaaS
    5. Serverless Hype and the Future of FaaS
  26. 19. Edge Computing
    1. Edge Computing Today
    2. Why We Care
    3. What Should Be in the Edge Versus the Cloud?
      1. How Do We Decide? The Driverless Car
    4. Edge Scaling Isn’t the Same as Cloud Scaling
      1. Criteria for Using Edge Versus Cloud
    5. Eight Keys to Success in the Edge
      1. #1: Be Smart About What Goes on the Edge
      2. #2: Don’t Ignore DevOps Principles in the Edge
      3. #3: Nail a Highly Distributed Deployment Strategy
      4. #4: Reduce Versioning as Much as Possible
      5. #5: Reduce Per Node Provisioning and Configuration Options
      6. #6: Scaling Is an Edge Issue, Not Just a Cloud Issue
      7. #7: Nail Monitoring and Analytics
      8. #8: The Edge Is Not Magic
    6. Edge Computing Overall
  27. 20. Geographic Impact on Using the Cloud
    1. Cloud Matters Everywhere, But at Different Levels
    2. Replacement Mentality Impacts How You Adopt Cloud
    3. Which Cloud Is Most Important?
    4. Important Technologies Differ
    5. Data Sovereignty Is Universal
    6. My Take
  28. VI. Conclusion
  29. 21. Putting It All Together
    1. Tenet #1—Availability
    2. Tenet #2—Architecture
    3. Tenet #3—Organization
    4. Tenet #4—Risk
    5. Tenet #5—Cloud
    6. Architecting for Scale
  30. Index

Product information

  • Title: Architecting for Scale, 2nd Edition
  • Author(s): Lee Atchison
  • Release date: March 2020
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492057178