O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The Art of Capacity Planning, 2nd Edition

Book Description

In their early days, Twitter, Flickr, Etsy, and many other companies experienced sudden spikes in activity that took their web services down in minutes. Today, determining how much capacity you need for handling traffic surges is still a common frustration of operations engineers and software developers. This hands-on guide provides the knowledge and tools you need to measure, deploy, and manage your web application infrastructure before you experience explosive growth.

In this thoroughly updated edition, authors Arun Kejariwal (MZ) and John Allspaw provide a systematic, robust, and practical approach to capacity planning—rather than theoretical models—based on their own experiences and those of many colleagues in the industry. They address the vast sea change in web operations, especially cloud computing.

  • Understand issues that arise on heavily trafficked websites or mobile apps
  • Explore how capacity fits into web/mobile app availability and performance
  • Use tools for measuring and monitoring computer performance and usage
  • Turn measurement data into robust forecasts and learn how trending fits into the planning process
  • Examine related deployment concepts: installation, configuration, and management automation
  • Learn how cloud autoscaling enables you to scale your app’s capacity up or down

Table of Contents

  1. Preface
    1. Why We Wrote and Revised This Book
    2. Focus and Topics
    3. Audience for This Book
    4. Organization of the Material
    5. Conventions Used in This Book
    6. O’Reilly Safari
    7. Using Code Examples
    8. We’d Like to Hear from You
    9. Acknowledgments
  2. 1. Goals, Issues, and Processes in Capacity Planning
    1. Background
    2. Preliminaries
    3. Quick and Dirty Math
    4. Predicting When Systems Will Fail
    5. Make System Stats Tell Stories
    6. Buying Stuff
    7. Performance and Capacity: Two Different Animals
    8. The Effects of Social Websites and Open APIs
    9. Readings
    10. Resources
  3. 2. Setting Goals for Capacity
    1. Different Kinds of Requirements and Measurements
      1. External Service Monitoring
      2. SLAs
      3. Business Capacity Requirements
      4. User Expectations
    2. Architecture Decisions
      1. Providing Measurement Points
      2. Resource Ceilings
      3. Hardware Decisions (Vertical, Horizontal, and Diagonal Scaling)
      4. Disaster Recovery
    3. Readings
    4. Resources
  4. 3. Measurement: Units of Capacity
    1. Capacity Tracking Tools
      1. Fundamentals and Elements of Metric Collection Systems
      2. Round-Robin Database and RRDTool
      3. Ganglia
      4. Simple Network Management Protocol
      5. Treating Logs as Past Metrics
      6. Monitoring as a Tool for Urgent Problem Identification
      7. Network Measurement and Planning
      8. Load Balancing
    2. Applications of Monitoring
      1. Application-Level Measurement
      2. Storage Capacity
      3. Database Capacity
      4. Caching Systems
      5. Establishing Caching System Ceilings
      6. Special Use and Multiple Use Servers
    3. API Usage and Its Effect on Capacity
    4. Examples and Reality
    5. Summary
    6. Readings
    7. Resources
  5. 4. Predicting Trends
    1. Riding the Waves
      1. Trends, Curves, and Time
      2. Tying Application Level Metrics to System Statistics: Database Example
      3. Forecasting Peak-Driven Resource Usage: Web Server Example
      4. Caveats Concerning Small Datasets
      5. Automating the Forecasting
      6. Safety Factors
    2. Procurement
      1. Procurement Time: The Killer Metric
      2. Just-in-Time Inventory
    3. The Effects of Increasing Capacity
    4. Long-Term Trends
      1. Traffic Pattern Changes
      2. Application Usage Changes and Product Planning
    5. Iteration and Calibration
      1. Best Guesses
      2. Diagonal Scaling Opportunities
    6. Summary
    7. Readings
    8. Resources
  6. 5. Deployment
    1. Automated Deployment Philosophies
      1. Goal 1: Minimize Time to Provision New Capacity
      2. Goal 2: All Changes Happen in One Place
      3. Goal 3: Never Log in to an Individual Server (for Management)
      4. Goal 4: Have New Servers Start Working Automatically
      5. Goal 5: Maintain Consistency for Easier Troubleshooting
    2. Automated Installation Tools
      1. Preparing the OS Image
      2. The Installation Process
    3. Automated Configuration
      1. Defining Roles and Services
      2. An Example: Splitting Off Static Web Content
      3. User Management and Access Control
      4. Ad Hockery
      5. Example 2: Multiple Datacenters
    4. Summary
    5. Readings
    6. Resources
  7. 6. Autoscaling
    1. The Challenge
      1. Autoscaling on Amazon EC2
      2. Design Guidelines
      3. Scalability Analysis
      4. Properties
      5. Autoscaling by Fixed Amount
      6. Scaling by Percentage
      7. Startup Time Aware Scaling
      8. Potpourri
      9. Advanced Approaches
    2. Summary
    3. Readings
    4. Resources
  8. A. Virtualization
    1. Overview
      1. Looking Back and Moving forward
  9. B. Dealing with Instantaneous Growth
    1. Mitigating Failure
      1. Graceful Degradation and Disabling Heavy Features
      2. Baked Static Pages and Beyond
      3. Cache but Serve Stale
    2. Handling Outages
  10. C. Capacity Tools
    1. Monitoring
      1. Metric Collection and Event Notification Systems
      2. Ad Hoc Measurement and Graphing Tools
    2. Deployment Tools
      1. Automated OS Installation
      2. Configuration Management
      3. Cluster Management/Container Orchestration
      4. Inventory Management
      5. Trend Analysis and Curve Fitting
      6. Books on Queuing Theory and the Mathematics of Capacity Planning
  11. Index