SOMETIMES EVENTS OCCUR OUTSIDE OF YOUR CONTROL, YOUR FORESIGHT, AND YOUR BUDGET. AN unexpected incident—technological or otherwise—can wipe out all your future projections. There are no magic theories or formulas to banish your capacity woes in these situations, but you may be able to lessen the pain.
Besides catastrophes—like a tornado destroying your data center—the biggest problem you’re likely to face is too much traffic. Ironically, becoming more popular than you can handle could be the worst web operations nightmare you’ve ever experienced. You might be fortunate enough to have a popular piece of content that is the target of links from all over the planet, or launch a new, killer feature that draws more attention than you ever planned. This can be as exciting as having your name in lights, but you might not feel so fortunate at the time it’s all happening.
From a capacity point of view, not much can be done instantaneously. If you’re being hosted in a utility computing, or virtualized manner, it’s possible to add capacity relatively quickly depending on how it will be used—but this approach has limits. Adding servers can only solve the “I need more servers” problem. It can’t solve the harder architectural problems that can pop up when you least expect them.
At Flickr, we have found that edge-use cases arise (probably more often than routine capacity issues!) that tax the infrastructure in ways we hadn’t expected. For example, some ...