Chapter 2. Two Mistakes High—Having Room to Recover from Mistakes

Consider the following anecdote I once overheard:

We were wondering how changing a setting on our MySQL database might impact our performance, but we were worried that the change might cause our production database to fail. Because we didn’t want to bring down production, we decided to make the change to our backup (replica) database instead. After all, it wasn’t being used for anything at the moment.

Makes sense, right? Have you ever heard this rationale before?

Well, the problem here is that the database was being used for something. It was being used to provide a backup for production. Except it couldn’t be used that way anymore.

You see, the backup database was essentially being used as an experimental playground for trying different types of settings. The net result was that the backup database began to drift away from the primary production database as settings began to change over time.

Then, one day, the inevitable happened.

The production database failed.

The backup database initially did what it was supposed to do. It took over the job of the primary database. Except it really couldn’t. The settings on the backup database had wandered so far away from those required by the primary database that it could no longer reliably handle the same traffic load that the primary database handled.

The backup database slowly failed, and the site went down.

This is a true story. It’s a story about best intentions. ...

Get Architecting for Scale, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.