O'Reilly logo

Using SANs and NAS by W. Curtis Preston

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

LAN-Free Backups

As discussed in Chapter 1, LAN-free backups allow you to use a SAN to share one of the most expensive components of your backup and recovery system—your tape or optical library and the drives within it. Figure 4-2 shows how this is simply the latest evolution of centralized backups. There was a time when most backups were done to locally attached tape drives. This method worked fine when data centers were small, and each server could fit on a single tape. Once the management of dozens (or even hundreds) of individual tapes became too much or when servers would no longer fit on a tape, data centers started using backup software that allowed them to use a central backup server and back up their servers across the LAN. (The servers are now referred to by the backup system as clients.)

The evolution of backup methodologies

Figure 4-2. The evolution of backup methodologies

This methodology works great as long as you have a LAN that can support the amount of network traffic such backups generate. Even if you have a state-of-the-art LAN, you may find individual backup clients that are too big to back up across the LAN. Also, increasingly large amounts of system resources are required on the backup server and clients to back up large amounts of data across the LAN. Luckily, backup software companies saw this coming and added support for remote devices. This meant that you could again decentralize your backups by placing tape drives on each backup client. Each client would then be told when and what to back up by the central backup server, but the data would be transferred to a locally attached tape drive. Most major software vendors also allowed this to be done within a tape library. As depicted in Figure 4-2, you can connect one or more tape drives from a tape library to each backup client that needs them. The physical movement of the media within the library is then managed centrally—usually by the backup server.

Although the backup data at the bottom of Figure 4-2 isn't going across the LAN, this isn't typically referred to as LAN-free backups. The configuration depicted at the bottom of Figure 4-2 is normally referred to as library sharing , since the library is being shared, but the drives aren't. When people talk about LAN-free backups, they are typically referring to drive sharing , where multiple hosts have shared access to an individual tape drive. The problem with library sharing is that each tape drive is dedicated to the backup client to which it's connected.[2] The result is that the tape drives in a shared library go unused most of the time.

As an example, assume we have three large servers, each with 1.5 TB of data. Five percent of this data changes daily, resulting in 75 GB of incremental backups per day per host.[3] All backups must be completed within an eight-hour window, and the entire host must be backed up within that window, requiring an aggregate transfer rate of 54 MB/s[4] for full backups. If you assume each tape drive is capable of 15 MB/s,[5] each host needs four tape drives to complete its full backup in one night. Therefore, we need to connect four tape drives to each server, resulting in a configuration that looks like the one at the bottom of Figure 4-2. While this configuration allows the servers to complete their full backups within the backup window, that many tape drives will allow them to complete the incremental backup (75 GB) in just 20 minutes!

An eight-hour backup window each night results in 240 possible hours during a month when backups can be performed. However, with a monthly full backup that takes eight hours, and a nightly incremental backup that takes 20 minutes, they are going unused for 228 out of their 240 available hours.

However, suppose we take these same three servers and connect them and the tape library to a SAN as illustrated in Figure 4-3. When we do so, the numbers change drastically. As you can see in Figure 4-3, we'll connect each host to a switched fabric SAN via a single 100-MB Fibre Channel connection. Next, we must connect the tape drives to the SAN. (For reasons explained later, however, we will only need five tape drives to meet the requirements discussed earlier.) If the tape drives supported Fibre Channel natively, we can just connect them to the SAN via the switch. If the tape drives can connect only via standard parallel SCSI cables, we can connect the five tape drives to the SAN via a Fibre Channel router, as can be seen in Figure 4-3. The routers that are shipping as of this writing can connect one to six SCSI buses to one or two Fibre Channel connections. For this example, we will use a four-to-one model allowing us to connect our five tape drives to the SAN via a single Fibre Channel connection. (As is common in such configurations, two of the drives will be sharing one of the SCSI buses.)

LAN-free backups

Figure 4-3. LAN-free backups

With this configuration, we can easily back up the three hosts discussed previously. The reason that we only need only five tape drives is that in the configuration shown in Figure 4-3, we can use dynamic drive sharing software provided by any of the major backup software companies. This software performs the job of dynamically assigning tape drives to the hosts that need them. When it's time for a given host's full backup, it is assigned four of the five drives. The hosts that aren't performing a full backup will share the fifth tape drive. By sharing the tape drives in this manner, we could actually backup 25 hosts this size. Let's take a look at some math to see how this is possible.

Assume we have 25 hosts, each with 1.5 TB of data. (Of course, all 25 hosts need to be connected to the SAN.) With four of the five tape drives, we can perform an eight-hour full backup of a different host each night, resulting in a full backup for each server once a month. With the fifth drive, we can perform a 20-minute incremental backup of the 24 other hosts within the same eight-hour window.[6] If this was done with directly connected SCSI, it would require connecting four drives to each host, for a total of 100 tape drives! This, of course, would require a really huge library. In fact, as of this writing, there are few libraries available that can hold that many tape drives. Table 4-1 illustrates the difference in the price of these two solutions.

Table 4-1. 25-host LAN-free backup solution

Parallel SCSI solution

SAN solution

Quantity

Total

Quantity

Total

Tape drives

100

$900K

5

$45K

HBAs

25

$25K

25

$50K

Switch

None

2

$70K

Router

None

1

$7K

Device server software

Same

Same

Tape library

Same

Same

Total

$925K

$172K

Tip

Whether or not a monthly full backup makes sense for your environment is up to you. Unless circumstances require otherwise, I personally prefer a monthly full backup, nightly incremental backups, followed by weekly cumulative incremental backups.

Each solution requires some sort of HBA on each client. Since Fibre Channel HBAs are more expensive, we will use a price of $2,000 for the Fibre Channel HBAs, and $1,000 for the SCSI HBAs.[7] We will even assume you can obtain SCSI HBAs that have two SCSI buses per card. The SCSI solution requires 100 tape drives, and the SAN solution requires five tape drives. (We will use UHrium LTO drives for this example, which have a native transfer speed of 15 MB/s and a discounted price of $9,000.) Since the library needs to store at least one copy of each full backup and 30 copies of each incremental backup from 25 1.5-TB hosts, this is probably going to be a really large library. To keep things as simple as we can, let's assume the tape library for both solutions is the same size and the same cost. (In reality, the SCSI library would have to be much larger and more expensive.) For the SAN solution, we need to buy two 16-port switches, and one four-to-one SAN router. Each solution requires purchasing device server licenses from your backup software vendor for each of the 25 hosts, so this will cost the same for each solution. (The SAN solution licenses might cost slightly more than the SCSI solution licenses.) As you can see in Table 4-1, there is a $700,000 difference in the cost of these two solutions.

What about the original requirements above, you ask? What if you only need to back up three large hosts? A three-host SAN solution requires the purchase of a much smaller switch and only one router. It requires 12 tape drives for the SCSI solution but only five for the SAN solution. As you can see in Table 4-2, even with these numbers, the SAN solution is over $40,000 cheaper than the SCSI solution.

Table 4-2. Three-host LAN-free backup solution

Parallel SCSI solution

SAN solution

Quantity

Total

Quantity

Total

Tape drives

12

$108K

5

$45K

HBAs

3

$3K

3

$6K

Switch

None

1

$8K

Router

None

1

$7K

Device server software

Same

Same

Tape library

Same

Same

Total

$111K

$66K

How Does This Work?

The data in the previous section shows that sharing tape drives between servers can be a Good Thing. But how can multiple servers share the same physical tape drive? This is accomplished in one of two ways. Either the vendor uses the SCSI reserve and release commands, or they implement their own queuing system.

SCSI reserve/release

Tape drives aren't the first peripherals that needed to be shared between two or more computers. In fact, many pre-SAN high availability systems were based on disk drives that were connected to multiple systems via standard, parallel SCSI. (This is referred to as a multiple-initiator system.) In order to make this possible, the commands reserve and release were added to the SCSI-2 specification. The following is a description of how they were intended to work.

Each system that wants to access a particular device issues the SCSI reserve command. If no other device has previously reserved the device, the system is granted access to the device. If another system has already reserved the device, the application requesting the device is given a resolution conflict message. Once a reservation is granted, it remains valid until one of the following things happen:

  • The same initiator (SCSI HBA) requests another reservation of the same device.

  • The same initiator issues a SCSI release command for the reserved device.

  • A SCSI bus reset, a hard reset, or a power cycle occurs.

If you combine the idea of the SCSI reserve command with a SCSI bus that is connected to more than one initiator (host HBA), you get a configuration where multiple hosts can share the same device by simply reserving it prior to using it. This device can be a disk drive, tape drive, or SCSI-controlled robotic arm of a tape library. The following is an example of how the configuration works.

Each host that needs to put a tape into a drive attempts to reserve the use of the robotic arm with the SCSI reserve command. If it's given a resolution conflict message, it waits a predetermined period of time and tries again until it successfully reserves it. If it's granted a reservation to the robotic arm, it then attempts to reserve a tape drive using the SCSI reserve command. If it's given a resolution conflict message while attempting to reserve the first drive in the library, it continues trying to reserve each drive until it either successfully reserves one or is given a resolution conflict message for each one. If this happens, it issues a SCSI release command for the robotic arm and then waits a predetermined period of time and tries again. It continues to do this until it successfully puts a tape into a reserved drive. Once done, it can then back up to the drive via the SAN.

This description is how the designers intended it to work. However, shared drive systems that use the SCSI reserve/release method run into a number of issues, so many people consider the SCSI reserve/release command set to be fundamentally flawed. For example, one such flaw is that a SCSI bus reset releases the reservation. Since SCSI bus resets happen under a number of conditions, including system reboots, it's highly possible to completely confuse your reservation system with the reboot of only one server connected to the shared device. Another issue with the SCSI reserve/release method is that not all platforms support it. Therefore, a shared drive system that uses the SCSI reserve/release method is limited to the platforms on which these commands are supported.

Third-party queuing system

Most backup software vendors use the third-party queuing method for shared devices. When I think of how this works, I think my two small daughters. Their method of toy sharing is a lot like the SCSI reserve/release method. Whoever gets the toy first gets it as long as she wants it. However, as soon as one of them has one toy, the other one wants it. The one who wants the toy continually asks the other one for it. However, the one that already has the toy issues a resolution conflict message. (It sounds like, "Mine!") The one that wants the toy is usually persistent enough, however, that Mom or Dad have to intervene. Not very elegant, is it?

A third-party queuing system is like having Daddy sit between the children and the toys, where each child has full knowledge of what toys are available, but the only way they can get a toy is to ask Daddy for it. If the toy is in use, Daddy simply says, "You can't have that toy right now. Choose another." Then once the child asks for a toy that's not being used, Daddy hands it over. When the child is done with the toy, Daddy returns it to the toy bin.

There are two main differences between the SCSI reserve/release method and the third-party queuing method:

Reservation attempts are made to a third-party application

In the SCSI reserve/release method, the application that wants a drive has no one to ask if the drive is already busy—other than the drive itself. If the drive is busy, it simply gets a resolution conflict message, and there is no queuing of requests. Suppose for example, that Hosts 1, 2, and 3 are sharing a drive. Host 1 requests for and is issued a reservation for the drive and begins using it. Host 2 then attempts to reserve the drive and is given a resolution conflict message. Host 3 then attempts to reserve the drive and is also given a resolution conflict message. Host 2 then waits the predetermined period of time and requests the drive again, but it's still being used. However, as soon as it's given a resolution conflict message, the drive becomes available. Assuming that Host 2 and Host 3's waiting time is the same, Host 3 will ask for the drive next, and is granted a reservation. This happens despite the fact that it's actually Host 2's "turn" to use the drive, since it asked for the drive first.

However, consider a third-party queuing system. In the example, Host2 would have asked a third-party application if the drive was available. It would have been told to wait and would be placed in a queue for the drive. Instead of having to continually poll the drive for availability, it's simply notified of the drive's availability by the third-party application. The third-party queueing system can also place multiple requests into a queue, keeping track of which host asked for a drive first. The hosts are then given permission to use the drive in the order the requests were received.

Tape movement is accomplished by the third party

Another major difference between third-party queuing systems and the SCSI reserve/release method is that, while the hosts do share the tape library, they don't typically share the robotic arm. When a host requests a tape and a tape drive, the third-party application grants the request (if a tape and drive are available) and issues the robotic request to one host dedicated as the robotic control host.[8]

Levels of Drive Sharing

In addition to accomplishing drive sharing via different methods, backup software companies also differ in how they allow libraries and their drives to be shared. In order to understand what I mean, I must explain a few terms. A main server is the central backup server in any backup configuration. It contains the schedules and indexes for all backups and acts as a central point of control for a group of backup clients. A device server has only tape drives that receive backup data from clients. The main server controls its actions. This is known as a three-tiered backup system , where the main server can tell a backup client to back up to the main server's tape drives or to the tape drives on one of its device servers. Some backup software products don't support device servers and require all backups to be transmitted across the LAN to the main server. This type of product is known as a two-tiered backup system . (A single-tiered backup product would support only backups to a server's own tape drives. An example of such a product would be the software that comes with an inexpensive tape drive.)

Sharing drives between device servers

Most LAN-free backup products allow you to share a tape library and its drives only between a single main server and/or device servers under that main server's control. If you have more than one main server in your environment, this product doesn't allow you to share a tape library and its drives between multiple main servers.

Sharing drives between main servers

Some LAN-free backup products allow you to share a library and its drives between multiple main servers. Many products that support this functionality do so because they are two-tiered products that don't support device servers, but this isn't always the case.

Restores

One of the truly beautiful things about sharing tape drives in a library via a SAN is what happens when it's time to restore. First, consider the parallel study illustrated in Table 4-2. Since only three drives are available to each host, only three drives will be available during the restores as well. Even if your backup software had the ability to read more than three tapes at a time, you wouldn't be able to do so.

However, if you are sharing the drives dynamically, a restore could be given access to all available drives. If your backups occur at night, and most of your restores occur during the day, you would probably be given access to all drives in the library during most restores. Depending on the capabilities of your backup software package, this can drastically increase the speed of your restore.

Tip

Few backup software products can use more drives during a restore than they did during a backup, but that isn't to say that such products don't exist.

Other Ways to Share Tape Drives

As of this writing, there are at least three other ways to share devices without using a SAN. Although this is a chapter about SAN technology, I thought it appropriate to mention them here.

NDMP libraries

The Network Data Management Protocol (NDMP), originally designed by Network Appliance and PDC/Intelliguard (acquired by Legato) to back up filers, offers another way to share a tape library. There are a few vendors that have tape libraries that can be shared via NDMP.

Here's how it works. First, the vendors designed a small computer system to support the "tape server" functionality of NDMP:

  • It can connect to the LAN and use NDMP to receive backups or transmit restores via NDMP.

  • It can connect to the SCSI bus of a tape library and transmit to (or receive data from) its tape drives.

They then connect the SCSI bus of this computer to a tape drive in the library, and the Ethernet port of this computer to your LAN. (Current implementations have one of these computers for each tape drive, giving each tape drive its own connection to the LAN.) The function of the computer system is to convert data coming across the LAN in raw TCP/IP format and to write it as SCSI data via a local device driver which understands the drive type. Early implementations are using DLT 8000s and 100-Mb Ethernet, because the speed of a 100-Mb network is well matched to a 10-MB/s DLT. By the time you read this, there will probably be newer implementations that use Gigabit Ethernet and faster tape drives.

Not only does this give you a way to back up your filers; it also offers another option for backing up your distributed machines. As will be discussed in Chapter 7, NDMP v4 supports backing up a standard (non-NDMP) backup client to an NDMP-capable device. Since this NDMP library looks just like another filer to the backup server, you can use it to perform backups of non-NDMP clients with this feature. Since the tape library does the job of tracking which drives are in use and won't allow more than one backup server to use it at a time, you can effectively share this tape library with any backup server on the LAN.

SCSI over IP

SCSI over IP, also known as iSCSI, is a rather new concept when compared to Fibre Channel, but it's gaining a lot of ground. A SCSI device with an iSCSI adapter is accessible to any host on the LAN that also has an iSCSI interface card. Proponents of iSCSI say that Fibre Channel-based SANs are complex, expensive, and built on standards that are still being finalized. They believe that SANs based on iSCSI will solve these problems. This is because Ethernet is an old standard that has evolved slowly in such a way that almost all equipment remains interoperable. SCSI is also another standard that's been around a long time and is a very established protocol.

The one limitation to iSCSI today is that it's easy for a host to saturate a 100-MB/s Fibre Channel pipe, but it's difficult for the same host to saturate a 100-MB/s (1000 Mb or 1 Gb) Ethernet pipe. Therefore, although the architectures offer relatively the same speed, they aren't necessarily equivalent. iSCSI vendors realize this and are therefore have developed network interface cards (NICs) that can communicate "at line speed." One way to do this is to offload the TCP/IP processing from the host and perform it on the NIC itself. Now that vendors have started shipping gigabit NICs that can communicate at line speed, iSCSI stands to make significant inroads into the SAN marketplace. More information about iSCSI can be found in Appendix A.

Shared SCSI

There are some vendors that provide shared access to SCSI devices via standard parallel SCSI. Since all the LAN-free implementations that I have seen have used Fibre Channel, it's unclear how well such products would be supported by drive-sharing products.

A Variation on the Theme

LAN-free backups solve a lot of problems by removing backup traffic from the LAN. When combined with commercial interfaces to database backup APIs, they can provide a fast way to back up your databases. However, what happens if you have an application that is constantly changing data, and for which there is no API? Similarly, what if you can't afford to purchase the interface to that API from your backup software company? Usually, the only answer is to shut down the application during the entire time of the backup.

One solution to this problem is to use a client-free backup solution, which is covered in the next section. However, client-free backups are expensive, and they are meant to solve more problems (and provide more features) than this. If you don't need all the functionality provided by client-free backups, but you still have the problem described earlier, perhaps what you need is a snapshot. In my book Unix Backup and Recovery, I introduced the concept of snapshots as a feature I'd like to see integrated into commercial backup and recovery products, and a number of them now offer it.

What is a snapshot?

A snapshot is a virtual copy of a device or filesystem. Think of it as a Windows shortcut or Unix symbolic link to a device that has been frozen in time. Just as a symbolic link or shortcut isn't really a copy of a file or device that it points to, a snapshot is a symbolic (or virtual) representation of a file or device. The only difference between a snapshot and a shortcut or symbolic link is that the snapshot always mimics the way that file or device looked at the time the snapshot was taken.

In order to take snapshots, you must have snapshot software. This software can be found in any of the following places:

Advanced filesystems

There are filesystems that let you create snapshots as part of their list of advanced features. These filesystems are usually not the filesystem provided by the operating-system vendor and are usually only available at an extra cost.

Standard host-based volume managers

A host-based volume manager is the standard type of volume manager you are used to seeing. They manage disks that are visible to the host and create virtual devices using various levels of RAID, including RAID (striping), RAID 1 (mirroring), RAID 5, RAID 0+1 (striping and mirroring), and RAID 1+0 (mirroring and striping). (See Appendix B for descriptions of the various levels of RAID.) Snapshots created by a standard volume manager can be seen only on the host where the volume manager is running.

Enterprise storage arrays

A few enterprise storage arrays can create snapshots. These snapshots work virtually the same as any other snapshot, with the additional feature that a snapshot made within the storage array can be made visible to another host that also happens to be connected to the storage array. There is usually software that runs on Unix or NT that communicates with the storage array and tells it when and where to create snapshots.

Enterprise volume managers

Enterprise volume managers are a relatively new type of product that attempt to provide enterprise storage array type features for JBOD disks. Instead of buying one large enterprise storage array, you can buy SAN-capable disks and create RAID volumes on the SAN. Some of these products also offer the ability to create snapshots that are visible to any host on the SAN.

Backup software add-on products

Some backup products have recognized the value provided by snapshot software and can create snapshots within their software. These products emulate many other types of snapshots that are available. For example, some can create snapshots only of certain filesystem types, just like the advanced filesystem snapshots discussed earlier. Others can create snapshots of any device that is available to the host, but the data must be backed up via that host, just like the snapshots provided by host-based volume managers. Some, however, can create snapshots that are visible to other hosts, emulating the functionality of an enterprise storage array or enterprise volume manager. This type of snapshot functionality is discussed in more detail later in this chapter.

Let's review why we are looking at snapshots. Suppose you perform LAN-free backups but have an application for which there is no API or can't afford the API for an application. You are therefore required to shut down the application during backups. You want something better but aren't yet ready for the cost of client-free backups, nor do you need all the functionality they provide. Therefore, you need the type of snapshots that are available only from an advanced filesystem, a host-based volume manager, or a backup software add-on product that emulates this functionality. An enterprise storage array that can create snapshots (that are visible from the host of which the snapshot was taken) also works fine in this situation, but it isn't necessary. Choose whichever solution works best in your environment. Regardless of how you take the snapshot, most snapshot software works essentially the same way.

When you create a snapshot, the snapshot software records the time at which the snapshot was taken. Once the snapshot is taken, it gives you and your backup utility another name through which you may view the snapshot of the device or filesystem. It looks like any other device or filesystem, but it's really a symbolic representation of the device. Creating the snapshot doesn't actually copy data from diska to diska.snapshot, but it appears as if that's exactly what happened. If you look at diska.snapshot, you'll see diska exactly as it looked at the moment diska.snapshot was created.

Creating the snapshot takes only a few seconds. Sometimes people have a hard time grasping how the software can create a separate view of the device without copying it. This is why it's called a snapshot; it doesn't actually copy the data, it merely took a "picture" of it.

Once the snapshot has been created, most snapshot software (or firmware in the array) monitors the device for activity. When it sees that a block of data is going to change, it records the "before" image of that block in a special logging area (often called the snapshot device). Even if a particular block changes several times, it only needs to record the way it looked before the first change occurred.[9]

For details on how this works, please consult your vendor's manual. When you view the device or filesystem via the snapshot virtual device or mount point, it watches what you're looking for. If you request a block of data that has not changed since the snapshot was taken, it retrieves that block from the original device or filesystem. However, if you request a block of data that has changed since the snapshot was taken, it retrieves that block from the snapshot device. This, of course, is completely invisible to the user or application accessing the data. The user or application simply views the device via the snapshot device or mount point, and where the blocks come from is managed by the snapshot software or firmware.

Problem solved

Now that you can create a "copy" of your system in just a few seconds, you have a completely different way to back up an unsupported application. Simply stop the application, create the snapshot (which only takes a few seconds) and restart the application. As far as the application is concerned, the backup takes only a few seconds. However, there is a performance hit while the data is being backed up to the locally attached tape library (that is being shared with other hosts via a LAN-free backup setup). The degree to which this affects the performance of your application will, of course, vary. The only way to back up this data without affecting the application during the actual transfer of data from disk to tape is to use client-free backups.

Tip

One vendor uses the term "snapshot" to refer to snapshots as described here, and to describe additional mirrors created for the purpose of backup, which is discussed in Section 4.3. I'm not sure why they do this, and I find it confusing. In this book, the term snapshot will always refer to a virtual, copy-on-write copy as discussed here.

Problems with LAN-Free Backups

LAN-free backups solve a lot of problems. They allow you to share one or more tape libraries between your critical servers, allowing each to back up much faster than they could across the LAN. It removes the bulk of the data transfer from the LAN, freeing your LAN for other uses. It also reduces the CPU and memory overhead of backups on the clients, because they no longer have to transmit their backup data via TCP/IP. However, there are still downsides to LAN-free backups. Let's take a look at a LAN-free backup system to see what these downsides are.

A typical LAN-free backup system is shown in Figure 4-4, where the database resides on disks that are visible to the data server. These disks may be a set of discreet disks inside the data server, a disk array with or without mirroring, or an enterprise storage array. In order to back up the data, the data must be read from the disks by the data server (1a) and transferred across the data server's backplane, CPU, and memory via some kind of backup software. It's then sent to tape drives on the data server (1b). This is true even if you use the snapshot technology discussed earlier. Even though you've created a static view of the data to back up, you still must back up the data locally.

A typical LAN-free backup system

Figure 4-4. A typical LAN-free backup system

This, of course, requires using the CPU, memory, and backplane of the data server quite a bit. The application is negatively affected, or even shut down, during the entire time it takes to transfer the database from disk to tape. The larger the data, the greater the impact on the data server and the users. Also, traditional database recovery (including snapshot recovery systems like those described at the end of the previous section) requires transferring the data back across the same path, slowing down a recovery that should go as fast as possible.

With this type of setup, a recovery takes as long or longer than the backup and is limited by the I/O bandwidth of the data server. This includes only the recovery of the database files themselves. If it's a database you are recovering, the replaying of the transaction logs adds a significant amount of time on top of that.

Let's take a look at each of these limitations in more detail.

Application impact

You'd think that backing up data to locally attached tape drives would present a minimal impact to the application. It certainly creates much less of a load than typical LAN-based backups. In reality, however, the amount of throughput required to complete the backup within an acceptable window can sometimes create quite a load on the server, robbing precious resources needed by the application the server is running. The degree to which the application is affected depends on certain factors:

  • Are the size and computing power of the server based only on the needs of the "primary" application, or are the needs of the backup and recovery application also taken into account? It's often possible to build a server that is powerful enough that the primary application isn't affected by the demands of the backup and recovery application, but only if both applications are taken into account when building the server. This is, however, often not done.

  • How much data needs to be transferred from online storage (i.e., disk) to offline storage (i.e., tape) each night? This affects the length of the impact.

  • What are the I/O capabilities of the server's backplane? Some server designs do a good job of computing but a poor job of transferring large amounts of data.

  • How much memory does the backup application require?

  • Can the primary application be backed up online, or does it need to be completely shut down during backups?

  • How busy is the application during the backup window? Is this application being accessed 24 x 7, or is it not needed while backups are running?

Please notice that the last question asked about when the application is needed -- not when it's being used. The reason the question is worded this way is that too many businesses have gotten used to systems that aren't available during the backup window. They have grown to expect this, so they simply don't try to use it at that time. They would use it if they could, but they can't—so they don't. The question is, "Would you like to be able to access your system 24 x 7?" If the answer is yes, you need to design a backup and recovery system that creates minimal impact on the application.

Almost all applications are impacted in some way during LAN-based or LAN-free backups. File servers take longer to process file requests. If you have a database and can perform backups with the database running, your database may take longer to process queries and commits. If your database application requires you to shut down the database to perform a backup, the impact on users is much greater.

Whether you are slowing down file or database services, or you are completely halting all database activity, it will be for some period of time. The duration of this period is determined by four factors:

  • How much data do you have to back up?

  • How many offline storage devices do you have available?

  • How much can your backup software take advantage of these devices?

  • How well can your server handle the load of moving data from point A to point B?

Recovery speed

This is the only reason you are backing up, right? Many people fail to take recovery speed into consideration when designing a backup and recovery system when they should be doing almost the opposite. They should design a backup and recovery system in such a way that it can recover the system within an acceptable window. In almost every case, this also results in a system that can back up the system within an acceptable window.

If your backup system is based on moving data from disk to tape, and your recovery system is based on moving data from tape to disk, the recovery time is always a factor of the questions in the previous section. They boil down to two basic questions: how much data do you have to move, and what resources are available to move it?

No other way?

Of course applications are affected during backups. Recovery takes as long, if not longer, than the backup. There's simply no way to get around this, right? That was the correct answer up until just recently; however, client-free backups have changed the rules.

What if there was a way you could back up a given server's data with almost no impact to the application? If there were any impact, it would last for only a few seconds. What if you could recover a multi-terabyte database instantaneously? Wouldn't that be wonderful? That's what client-free backups can do for you.



[2] As mentioned later in this chapter, SCSI devices can be connected to more than one host, but it can be troublesome.

[3] This is actually a high rate of change, but it helps prove the point. Even with a rate of change this high, the drives still go unused the majority of the time.

[4] 1.575 TB x 8 hours x 60 minutes x 60 seconds = 54.6 MB/s

[5] There are several tape drives capable of these backup speeds, including AIT-3, LTO, Mammoth, Super DLT, 3590, 9840, and DTF.

[6] 20 minutes x 24 hosts = 480 minutes, or 8 hours

[7] These are Unix prices. Obviously, Windows-based cards cost much less.

[8] Although it's possible that some software products have also implemented a third-party queuing system for the robotic arm as well, I am not aware of any that do this. As long as you have a third-party application controlling access to the tape library and placing tapes into drives that need them, there is no need to share the robot in a SCSI sense.

[9] Network Appliance filers appear to act this way, but the WAFL filesystem is quite a bit different. They store a "before" image of every block that is changed every time they sync the data from NVRAM to disk. Each time they perform a sync operation, they leave a pointer to the previous state of the filesystem. A Network Appliance snapshot, then, is simply a reference to that pointer. Please consult your Network Appliance documentation for details.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required