A long time ago in a data center far away, there were servers that were small enough to fit on a tape. This type of data center led to a backup system design like the one in Figure 1-3. Many or most systems came with their own tape drive, and that tape drive was big enough to back up that system—possibly big enough to back up other systems. All that was needed to perform a fully automated backup was to write a few shell scripts and swap out a few tapes in the morning.
For several reasons, bandwidth was not a problem in those days. The first reason was there just wasn't that much data to back up. Even if the environment consisted of a single 10-Mb hub that was chock full of collisions, there just wasn't that much data to send across the wire. The second reason that bandwidth wasn't a problem was that many of the systems could afford to have their own tape drives, so there wasn't a need to send any data across the LAN.
Gradually, many companies or individuals began to outgrow these systems. Either they got tired of swapping that many tapes, or they had systems that wouldn't fit on a tape any more. The industry needed to come up with something better.
A few early innovators came up with the concept of a centralized backup server. Combining this with a tape stacker made life manageable again. Now all you had to do was spend $5,000 to $10,000 on backup software and $5,000 to $10,000 on hardware, and your problems were solved. Every one of your systems would be backed up across the network to the central backup server, and all you needed to do was install the appropriate piece of software on each backup "client." These software packages even ported their client software to many different platforms, which meant that all the systems shown in Figure 1-4 could be backed up to the backup server, regardless of what operating system they were running.
Then a different problem appeared. People began to assume that all you had to do was buy a piece of client software, and all your backup problems would be taken care of. As the systems grew larger and the number of systems on a given network increased, it became more and more difficult to back up all the systems across the network in one night. Of course, upgrading from shared networks to switched networks and private VLANs helped a lot, as did Fast Ethernet (100 Mb), followed by Etherchannel and similar technologies (400 Mb), and Gigabit Ethernet. But some had systems that were too large to back up across the network, especially when they started installing very large database servers that contained 100 GB to 1 TB of records and files.
A few backup software companies tried to solve this problem by introducing the media server . In Figure 1-5, the central backup server still controlled all the backups, and still backed up many clients via the 100-MB or 1000-Mb network. However, backup software that supported media servers could attach a tape library to each of the large database servers, allowing these servers to back up to their own locally attached tape drives, instead of sending their data across the network.
Media servers solved the immediate bandwidth problem but introduced significant costs and inefficiencies. Each server needed a tape library big enough to handle a full backup. Such a library can cost from $50,000 to more than $500,000, depending on the size of the database server. This is also inefficient because, many servers of this size don't need to do a full backup every night. If the database software can perform incremental backups, you may need to perform a full backup only once a week or even once a month, which means that for the rest of the month, most tape drives in this library go unused. Even products that don't perform a traditional full backup have this problem. These products create a virtual full backup every so often by reading the appropriate files from scores of incremental backups and writing these files to one set of tapes. This method of creating a full backup also needs quite a few tape drives on an occasional basis.
Another thing to consider is that the size of the library (specifically, the number of drives that it contains) is often driven by the restore requirements—not the backup requirements. For example, one company had a 600-GB database they needed backed up. Although they did everything in their power to ensure that a tape restore would never be necessary, they knew they might need to in a true disaster. However, the restore requirement was still three hours. If the restore required reading from tape, the restore requirement didn't change; it still had to be done in less than three hours. Based on that, they bought a 10-drive library that cost $150,000. Of course, if they could restore the database in three hours, they could back it up in three hours. This meant that this $150,000 library was going unused approximately 21 hours per day.
Some backup software vendors attempted to solve the cost problem by allowing a single library to connect to multiple hosts. If you purchased a large library with multiple SCSI connections, you could connect each one to a different host. This allowed you to share the tape library but not the drives. While this ability helped reduce the cost by sharing the robotics, it didn't completely remove the inefficiencies discussed earlier.
What was really needed was a way to share the drives. And as long as the tape drives were shared, disk drives could be shared too. What if:
A large database server could back up to a locally attached tape drive, but that tape drive could also be seen and used by another large server when it needed to back up to a locally attached tape drive?
The large database server's disks could be seen by another server that backed up its disks without sending the data through the CPU of the server that's using the database?
The disks and tape drives were connected in such a way that allowed the data to be sent directly from disk to tape without going through any server's CPU?
Fibre Channel and SANs have made all of these "what ifs" possible, including many others that will be discussed in later chapters. SANs are making backups more manageable than ever—regardless of the size of the servers being backed up. In many cases, SANs are making things possible that weren't conceivable with conventional parallel SCSI or LAN-based backups.