Chapter 1. The Philosophy of Backup
I back up; therefore, I will be.
When I look at the title of this chapter, I think about the old Steve Martin stand-up routine in which he said that in philosophy class, “you learned just enough to screw you up for the rest of your life.” (Steve studied the important questions, like “Is it OK to yell ‘movie’ in a crowded fire house?” I promise not to do that.) However, “The Philosophy of Backup” did seem like an appropriate name for this chapter, since we’re going to talk about the why of backup. (We’ll also talk a little about the how, of course.)
Champagne Backup on a Beer Budget
A good backup and recovery system is essential for a company of any size. Unfortunately, IT doesn’t always get the budget it needs, and the backup system almost never gets the money that it needs. Well, if you agree that you need a very good backup system, but you don’t have enough money to pull that off, know that this book was written with you in mind. You need champagne backup on a beer budget. Welcome to the club.
Just because you have a small budget doesn’t mean you have to do without backup. Most of the backup systems in this book can be implemented in small environments for a few hundred dollars—including hardware.
Don’t worry, enterprise customers—there’s plenty in here for you as well. The more you use the techniques taught in this book, the more money you can save for other IT projects. By the time you’re done implementing all the ideas in this book, hopefully my next book will be done, which will be right up your alley. It will cover nothing but commercial data protection solutions, including multiplatform commercial backup and recovery systems, continuous data protection, near continuous data protection, data de-duplication backup systems, replication, and the like.
Now that you’ve read this far, you may find yourself asking questions like these:
Why should I read this book?
Can I really back up with open-source backup software?
Why should I be using disk?
Why should I back up at all?
How do I find a balanced way to back up (wax on/wax off)?
Let’s get started answering these questions.
Why Should I Read This Book?
If you’ve been doing system administration for some time, you may be asking yourself this question. There are many answers. Perhaps self-preservation is your primary motivator. You’d like to make sure you don’t lose your job the next time a disk drive dies. Perhaps you’ve already got a decent backup system and you’d just like to make it better. Maybe you are looking for some new ideas about how to deal with upcoming backup and recovery needs. What follows are some of the reasons I think you should read this book.
Schadenfreude is a German word that means to take joy in the misfortunes of others. It’s why we watch those weird videos on the Internet where some idiot tries to do something stupid and ends up hurting himself. Each of the sidebars in this book is a true horror story that really happened to someone I know. These are not urban legends or horror stories passed on from admin to admin. These are firsthand encounters with disaster. There’s a schadenfreude element to reading these stories, of course. But each story also makes a point, and it was not just made up to make that point. The things that I warn about in this book really happen. This can be a very tough job if you are not prepared, so read closely. You might want to start by reading the sidebar “The One That Got Away” later in this chapter. It’s the story of the defining moment in my career.
You Never Want to Say These Words
“We lost only a few days’ worth of data.” In the sidebar “The One That Got Away,” I said that we lost only a few days worth of data. I swore the day I said these words that I would never say them again. From that day forward, I was convinced of the importance of backups. I never again assumed anything, and I began to study everything I could about backup technology. This book represents my attempt to compile what I have learned about inexpensive backups into a single volume, and it is written so that no one who reads it should ever need to utter the preceding statement. In my opinion, no amount of data loss is acceptable . I would also wager that you would be hard-pressed to find an end user who would feel much different. Whether it’s a spreadsheet that one person created or a customer database representing hours or days of sales invoices and the efforts of hundreds of people—ask the person who needs the data how much data loss they think is acceptable. Every statement, every opinion, every story, and every chapter in this book is based on the premise that any data loss is unacceptable. Let me state that again for emphasis.
With the technology that is now available, there is no reason for any data to be lost—that is, if backups are given the proper attention and priority that they need.
You’re Curious About Open-Source Backup Products
Just a few years ago, you could perform your backups with a few scripts and
demand for midrange computers grew astronomically, and the need for bigger databases,
larger drives or filesystems, long filenames, and long pathnames grew proportionally.
These large databases and filesystems started shipping, which then created a large
market for commercial backup utilities, and one or two such products emerged; scores of
others eventually followed.
Some of these early products were just GUIs and volume management built on top of
existing native backup utilities to provide enhanced levels of functionality. Other
companies felt that these native utilities had many limitations that could not be fixed
without abandoning them altogether. Those companies chose to develop custom, even
proprietary, backup methods. They attempted to overcome the limitations that products
tar could not overcome.
In recent years, the demand for centralized backup and recovery has also given rise
to a number of open-source backup and recovery tools, six of which are
covered in this book. The open-source backup market followed a pattern similar to the
commercial products mentioned. The original open-source backup product, Amanda, is a
wrapper around the native utility of your choice. BackupPC leaves data in its original
format, and Bacula uses a custom format designed to overcome the limitations of GNU
There are now a number of choices in the open-source backup market. It’s quite possible that one or more of the open-source products covered in this book can meet your backup and recovery needs. This book is currently the only resource that covers all of these tools in a single place.
You Want to Learn About Disk-Based Backup
If you haven’t heard of disk-based backup or disk-to-disk-to-tape (D2D2T) backup, then it’s time to turn off the digital video recorder (DVR) and pick up a trade magazine or two. (Of course, your DVR is nothing more than disk-based backup of your TV. And if you’re occasionally making VHS tapes of your DVR shows, it’s even a D2D2T system.) The use of disk in backup and recovery systems has exploded in the last few years, and it’s really solving a lot of problems.
Chapter 9 covers backup hardware and goes into much more detail about why disks have become a very attractive backup target. Here is a quick summary of some of those reasons:
The biggest reason that disk has become such an attractive backup target is that the cost of disk has been dramatically reduced in the last few years. The cost of a reasonably priced disk array is now approximately the same price as a similarly sized tape library filled with media. When you consider some of the things you can do with disk, such as eliminating full backups and redundant files, disk becomes even less expensive.
Unlike tapes, disks are closed systems that aren’t susceptible to outside contaminants. In addition, the actual media of a hard drive is, well, hard when compared to a piece of tape media. The result is that an individual disk drive is inherently more reliable than a tape drive. Disk drives become even more reliable when you put them in a RAID array.
Generally speaking, tape drives can only go two speeds: stop and very fast. Yes, some tape drives support variable speeds. However, they can usually only slow down to about 40 percent of the rated speed of the drive. Disk drives, on the other hand, work at whatever speed you need them to go. If you need to go a few hundred megabytes per second, put a few drives in a RAID group, and blast away. Then if you need that same RAID group to write at 10 KB/s, go ahead. Unlike tape drives, disk drives have no problem writing slowly, then quickly, then slowly, then.... You get the picture. This makes disk a perfect match for unpredictable backup streams. Once all that random data has been written in a serial fashion on your disk device, the disks can easily stream backup data to tape—if that’s what you want to do. Some people are foregoing that step altogether and replacing it with replication. Try doing that with a tape drive.
Disk-based backups are also an extremely economical way to bring completely automated backups to small and medium businesses (SMBs). While a large tape library can be very inexpensive (on a dollars-per-gigabyte basis) and very expandable, the same is not always true of smaller libraries aimed at the SMB market. The big challenge is expandability. The less expensive a tape library is, the less expandable it usually is. (There are always exceptions, of course.) By comparison, some of the completely automated open-source backup products mentioned in this book can be used with a single disk drive costing less than $100. If you need to expand beyond that, just buy another disk and add it to your volume manager. You can also buy RAID controllers that allow you to start with one disk and add more as your needs grow. You can use this method to expand from hundreds of gigabytes to many terabytes of capacity.
Why Back Up?
I’ve heard it all. I’ve been accused of caring only about backups. It’s been said that I think the whole world revolves around a cartridge reel. I’ve said that someday the world’s going to crash, and I’m going to have the backup. The question is: how serious are you about protecting your data? To help you come to a decision on this matter, let’s talk about what happens if you don’t have good backups.
What Will Lost Data Cost You?
To answer this question, you need to consider what kind of data you are backing up. This is a perfect time to include people who may not consider themselves computer people. Get input from other departments to answer this question. When all those 1s and 0s come together, just what kind of information are we talking about? Do you use manual accounting methods or are your company’s financial records stored in some accounting software somewhere? When a customer calls in and orders something, do you jot that down on a carbon-copied order form or do you enter it in some sort of order processing program? What about things like budgets, memoranda, inventories, and any other “paperwork” that you throw around from day to day? Do you keep copies of every important memo that you send, or do you depend on the computer for that?
If you’re like most people, you have grown quite dependent on these things we call computers. You forget how much of your work has been saved in the form of little magnetized bits spread out across a bunch of spinning platters. Maybe you work in an environment in which you’ve never lost a disk, so you’ve never had to do a restore. Maybe you’ve never fat-fingered a key and deleted an important file. If that’s the case, remember what my dad used to say: “motorcycle riders come in two types—those who have fallen and those who will fall.” The same is true of disk drives. If you’ve never had a failed disk drive, trust me, your turn is coming!
So what would you lose if you lost data? To quantify this, we need to examine the types of information that may reside in your environment and what would happen if you lost each type of information. Most of what you could lose is very tangible—and quantifiable in monetary terms—and it might surprise you.
- Lost customers
This is quite possibly the most tangible and most devastating of all losses. If your entire customer database is on a computer somewhere, how will you know who they are if that computer dies? You might actually “lose” your customers and never find them again. You could also lose customers who depend on data that is on one or more of your computers; if the customer finds out that you have lost his data, he will undoubtedly be less than impressed with you. The degree to which this data loss affects him may not even be relevant to him; he knows that you lost his data, and he might leave just because he no longer feels your company is competent.
Whatever service or product your company provides, you have some way to keep track of requests for that product or service. Again, chances are that the method is computer-based. Data loss may mean several hours, days, or even weeks of lost orders. These may be orders that your salespeople worked very hard to get!
Think about how you would feel if you were one of the salespeople whose orders were lost. You spent days or weeks working on sales, and now they’re gone forever. Maybe you should go somewhere where your hard work doesn’t go to waste. The better the salesperson, the better the chance that she may jump ship if you lose her sales. What about the average employee? If your computers have a reputation for going down and a reputation for losing data, it gives the employees a feeling of helplessness. Maybe they should go somewhere where they have the proper equipment to do their jobs.
What about your standing in the industry? News of a major data loss undoubtedly spreads. This news may get to competitors, whom you can trust to use it against you at any opportunity. The news may also get to a regulatory agency that is in charge of your type of company. For example, if you work for a U.S. bank, it would be a terrible thing for the Office of the Comptroller of the Currency (OCC) to find out that you had a major data loss. They may decide to take a really close look at your affairs. Nobody wants that kind of attention!
It takes only one story of lost data to give your computer department an internal reputation for data loss. Try as you might to get rid of it, that reputation may stay for a while. You’re only as good as your last restore. (A friend of mine said, “You’re only as good as your worst restore.”) If people don’t trust your backups, they will duplicate your backup efforts. Employees will spend time and money backing up their systems locally. Each person may decide to buy his own backup drive and backup software or even to come up with his own in-house script. Their backups will be inefficient and costly at best, and may subject them to further data loss at worst. When everybody takes matters into their own hands, you can lose quite a bit of money in people-hours and extra hardware.
How many people are supporting your computers? How much of their efforts will you lose if your development system loses data? I know of many companies that have numerous contract programmers writing code all the time. If the system that houses their work loses their code, how much money will you have wasted? In fact, no matter what department you look at, if they do their work on a computer, and you lose that data, you can lose considerable time and money.
What Will Downtime Cost You?
When planning your backup and recovery program, you may have several options that affect the speed of the recovery. The faster the recovery, the more the backup system will cost you. What you must ask yourself before deciding on these types of options is, “What will downtime cost?” When thinking about this, I’m reminded of a copier machine commercial from a few years ago: “When your copier goes down, do people just say, ‘That’s all right, we’ll just use carbon paper!’?” If one of your main systems goes down, can your people continue working, or does your entire company come to a standstill? If it comes to a standstill, are your people salaried, so that sending them home saves you no money? Here are some additional costs to consider:
- Customer perception
A customer hates to hear, “Please call back; our computers are down,” or “Connection not responding.” Depending on your type of business, they might just decide to go elsewhere. The longer your systems are down, the more customers will hear this message.
- Employee perception
Nobody wants to work at a company where the computers are always going down. The more your employees depend on your systems, the truer this becomes. If you were a salesperson who couldn’t use your contact database for a day or so, how happy would you be?
Wax On, Wax Off: Finding a Balance
Using a system that has no backups is like driving a car 100 miles an hour down a busy road the day after your insurance policy expires. Likewise, having a three-node, highly available cluster for a noncritical application is like having full coverage on your 20-year-old fifth car. Just as insurance plans have different levels of coverage and riders to cover various types of damage, different backup methodologies provide different levels of recoverability.
Don’t Go Overboard
Not all environments need up-to-the-minute data recoverability. For many environments, recovering the systems up to last night’s backups is acceptable. For some environments, recovering the system even up to last week or month is OK. Spending thousands of dollars and hundreds of hours implementing the greatest backup solution in the world is a waste if you don’t need that level of coverage. This usually is not the problem for most sites; on the contrary, most sites don’t spend nearly enough money or effort on their backup and recovery systems. In other cases, however, money may be wasted on unnecessarily elaborate systems.
Recoverability requirements also vary from machine to machine within the same company. The amount of work that would be lost, or the possibility of adversely affecting a customer, may determine these requirements. For example, it may be considered acceptable for an employee or two to lose a day’s work spent on a few word processing documents. That is, unless it was the Senior Vice President’s assistant who was working on the departmental budget, in which case your mileage may vary. And, it would probably be totally unacceptable for you to lose even one hour’s worth of entries into a companywide sales database used by hundreds of people.
The point is that your backup requirements are determined by your recovery requirements. The difficulty comes in finding and using a tool capable of providing the level of recovery that you need. Consider users’ home directories for a minute. If they are local to each user’s workstation, a loss of one user’s disk in the afternoon would mean that one user would lose a few hours of work. However, if user directories are located on an NFS file server that serves thousands of users, you could potentially lose several thousand hours of work if you use only traditional backup tools.
If the loss of a networked file server is unacceptable, you might want to consider
technology. Snapshot software allows you to take a “picture” of your drive or
filesystem at a single point in time and then use that picture to back up that drive
or filesystem. If the backup references the drive or filesystem via this snapshot, it will back
up a consistent picture of the drive or filesystem as it looked at the time the
snapshot was taken. If this kind of functionality is interesting to you, you might
consider reading Chapter 7,
which describes emulating snapshot functionality with
rsync and hard links.
Sometimes the tool you need comes with your operating system or database platform, but it’s just not being used properly. Sometimes backup tools aren’t being used at all. For example, if you have a production Oracle database, combining nightly hot backups with archived redo logs provides you up-to-the-minute recoverability. However, if you lose a disk that is part of a database that doesn’t back up its transaction logs, you will lose all work since the last cold backup. See Part V for more information .
If you have a production instance of any kind and are not using the transaction logging feature of your database engine, turn on logging as soon as possible!
Therefore, while it is necessary to find the appropriate utility to give you the degree of recoverability that you require, it is also necessary to use it.
Get the Coverage That You Need
Some environments cannot afford even one minute of downtime, and they should pay for the best backup coverage—whatever it costs. This is because of the great loss that they will incur if they ever lose their systems for even a short period (I know of one company that claims that it loses over $1 million a minute when its systems are down). On the other hand, if you are in an environment that can afford downtime, then spending huge amounts of money for an immediately available hot site  is a complete waste of money.
Consider Table 1-1. No one should depend on a car, or a computer, without having at least the basic level of coverage. If the only car that you own is uninsured and a drunk driver runs into you and totals it, how would you recover from such a loss? Similarly, if your computer systems have critical information stored on them, how will you recover when a hard drive crashes and all that data is lost? What some people forget is that the opposite of this equation is true as well. If you have a third car that happens to be a 20-year-old (nonclassic) car, you will probably get only liability coverage on it; you could live without that car if it were destroyed today. Spending hundreds of extra dollars a year to insure a $50 car just doesn’t make sense. Likewise, if the computers that you are managing are in an environment in which you can do without them for a few days, do you really need hot-swappable, mirrored drives? Pick an appropriate level of protection for your environment.
|Types of coverage||Automobile insurance||Computer backups|
|Minimum coverage||Collision and liability (just keeps you from losing your shirt if you run into someone).||Regular nightly backups (keeps you from losing your job when a disk drive dies)|
|Unexpected disasters||Comprehensive coverage (vandalism, acts of God, etc.).||Journaling filesystemsUninterruptible Power Supplies (UPSs)|
|Get me driving now||Rental car coverage (you get a car if your car is in the shop due to an accident).||RAIDMirroringUsing hot-swap drivesHigh-availability (HA) system|
|Major disasters||Another company will pick up your policy and replace your car if both your car and your insurance company are destroyed in an earthquake.||Sending copies of your backup volumes to off-site storage, in case both your computer room and media library are destroyedSending your backups via a dedicated network to a large storage system at your off-site storage vendor|
|Maximum protection||The insurance company not only agrees to the conditions listed earlier, but also agrees to store another car of the same model in another state that you can use at any time if all cars in your state are destroyed.||Real-time mirroring to a hot-swappable system at another of your sitesSending your backups via either network or courier to a hot-site vendor|
You need to balance the cost of a particular backup implementation against the projected monetary loss of the outage from which it protects you. For example, assume that you are evaluating two backup choices. The first option involves sending copies of your backup volumes to an off-site vendor for storage at a cost of $500 a month. The second option is an immediately available standby machine in another city that receives up-to-the-minute replication data from your production machine; let’s say this option costs you $5,000 a month.
Your company is located in Utopia, where no natural disasters ever occur, your disks are all mirrored, and you have determined that a day’s worth of downtime would cost only $500. Do you really want to spend $60,000 a year to protect against something that will probably never occur? If something catastrophic happened to your datacenter, wouldn’t the day-old, off-site copies serve just as well? Your company would suffer an extra day or so of downtime, but you have already determined that this is affordable. The $6,000-a-year solution is probably much more appropriate for this environment.
However, are you protecting yourself from everything that you should be? Are you in an area that is prone to natural disasters and yet have no protection against that sort of event? Maybe you need to consider a different type of off-site storage. If you have a customer base that needs the data on your computers on a regular basis, have you provided for quick recovery in case of a failure? Perhaps you should be considering a hot site or multiple-site mirroring of your database servers. Table 1-1 provides a good overview of the various levels of coverage.
Why the Word “Volume” Instead of “Tape”?
Most backup utilities were originally written to back up to tape. Therefore, most books and online manuals talk about backing up to tape. However, many people are backing up to CDs, magneto-optical disks, or even disk drives. These media types have many advantages, because they act more like disk drives than tape drives. Random access of backup data is easier and you can read them using any block size you wish, because they do not record interrecord gaps like tape drives do.
Since many people no longer use tape, this book uses the more generic word volume whenever appropriate. You’ll also find the term backup drive instead of tape drive. Again, that is because the backup drive could be a CD burner or a disk drive. The book uses the words tape and tape drive only when they are necessary and appropriate.
BackupCentral.com has a wiki page for every chapter in this book. Read or contribute updated information about this chapter at http://www.backupcentral.com.
 A hot site is a place where you have computers standing by to do an immediate recovery of your environment.