Chapter 4. Backing Up

Introduction

I began gathering contributions for this book, it soon become obvious that there would be an entire chapter on backups. Not only do BSD users follow the mantra “backup, backup, backup,” but every admin seems to have hacked his own solution to take advantage of the tools at hand and the environment that needs to be backed up.

If you’re looking for tutorials on how to use dump and tar, you won’t find them here. However, you will find nonobvious uses for their less well-known counterparts pax and cpio. I’ve also included a hack on backing up over ssh, to introduce the novice user to the art of combining tools over a secure network connection.

You’ll also find scripts that fellow users have created to get the most out of their favorite backup utility. Finally, there are hacks that introduce some very useful open source third-party utilities.

Back Up FreeBSD with SMBFS

A good backup can save the day when things go wrong. A bad—or missing—backup can ruin the whole week.

Regular backups are vital to good administration. You can perform backups with hardware as basic as a SCSI tape drive using 8mm tape cartridges or as advanced as an AIT tape library system using cartridges that can store up to 50 GB of compressed data. But what if you don’t have the luxury of dedicated hardware for each server?

Since most networks are comprised of multiple systems, you can archive data from one server across the network to another. We’ll back up a FreeBSD system using the tar and gzip archiving utilities and the smbutil and mount_smbfs commands to transport that data to network shares. These procedures were tested on FreeBSD 4.6-STABLE and 5.1-RELEASE.

Adding NETSMB Kernel Support

Since SMB is a network-aware filesystem, we need to build SMB support into the kernel. This means adding the proper options lines to the custom kernel configuration file. For information on building a custom kernel, see [Hack #54] , the Building and Installing a Custom Kernel section (9.3) of the FreeBSD Handbook, and relevant information contained in /usr/src/sys/i386/conf.

Add the following options under the makeoptions section:

options    NETSMB            # SMB/CIFS requester
options    NETSMBCRYPTO      # encrypted password support for SMB
options    LIBMCHAIN         # mbuf management library
options    LIBICONV
options    SMBFS

Once you’ve saved your changes, use the make buildkernel and make installkernel commands to build and install the new kernel.

Establishing an SMB Connection with a Host System

The next step is to decide which system on the network to connect to. Obviously, the destination server needs to have an active share on the network, as well as enough disk space available to hold your archives. It will also need a valid user account with which you can log in. You’ll probably also want to choose a system that’s backed up regularly to removable media. I’ll use a machine named smbserver1.

Tip

The smbutil and mount_smbfs commands both come standard with the base install of FreeBSD. Their only requirements are the five kernel options listed in the preceding section.

Once you have chosen the proper host, make an SMB connection manually with the smbutil login command. This connection will remain active, allowing you to interact with the SMB server, until you issue the smbutil logout command. So, to log in:

# smbutil login //jwarner@smbserver1
Password:
Connected to smbserver1

And to log out:

# smbutil logout //jwarner@smbserver1
Password:
Connection unmarked as permanent and will
be closed when possible

Mounting a Share

Once you’re sure you can manually initiate a connection with the host system, create a mount point where you can mount the remote share. I’ll create a mount point directory called /backup:

# mkdir /backup

Next, reestablish a connection with the host system and mount its share:

# smbutil login //jwarner@smbserver1
Password:
Connected to smbserver1

# mount_smbfs -N //jwarner@smbserver1/sharename /backup

Note that I used the -N switch to mount_smbfs to avoid having to supply a password a second time. If you prefer to be prompted for a password when mounting the share, simply omit the -N switch.

Archiving and Compressing Data with tar and gzip

After connecting to the host server and mounting its network share, the next step is to back up and copy the necessary files. You can get as complicated as you like, but I’ll create a simple shell script, bkup, inside the mounted share that compresses important files and directories.

This script will make compressed archives of the /boot, /etc, /home, and /usr/local/etc directories. Add to or edit this list as you see fit. At a minimum, I recommend including the /etc and /usr/local/etc directories, as they contain important configuration files. See man hier for a complete description of the FreeBSD directory structure.

#!/bin/sh
# script that backs up the following four directories:
tar cvvpzf boot.tar.gz /boot
tar cvvpzf etc.tar.gz  /etc
tar cvvpzf home.tar.gz /home
tar cvvpzf usr_local_etc.tar.gz /usr/local/etc

Tip

This script is an example to get you started. There are many ways to use tar. Read man 1 tar carefully, and tailor the script to suit your needs.

Be sure to make this file executable:

# chmod 755 bkup

Run the script to create the archives:

# ./bkup
tar: Removing leading / from absolute path names in the archive.
drwxr-xr-x root/wheel        0 Jun 23 18:19 2002 boot/
drwxr-xr-x root/wheel        0 May 11 19:46 2002 boot/defaults/
-r--r--r-- root/wheel    10957 May 11 19:46 2002 boot/defaults/loader.conf
-r--r--r-- root/wheel      512 Jun 23 18:19 2002 boot/mbr
(snip)

After the script finishes running, you’ll have *.tar.gz files of the directories you chose to archive:

# ls | more
bkup
boot.tar.gz
etc.tar.gz
home.tar.gz
usr_local_etc.tar.gz

Once you’ve tested your shell script manually and are happy with your results, add it to the cron scheduler to run on scheduled days and times.

Remember, how you choose to implement your backups isn’t important—backing up regularly is. Facing the problem of deleted or corrupted data isn’t a matter of “if” but rather a matter of “when.” This is why good backups are essential.

Hacking the Hack

Things to consider when modifying the script to suit your own purposes:

Add entries to automatically mount and unmount the share (see [Hack #68] for an example).
Use your backup utility of choice. You’re not limited to just tar!

Create Portable POSIX Archives

Create portable tar archives with pax.

Some POSIX operating systems ship with GNU tar as the default tar utility (NetBSD and QNX6, for example). This is problematic because the GNU tar format is not compatible with other vendors’ tar implementations. GNU is an acronym for “GNU’s not UNIX”—in this case, GNU’s not POSIX either.

GNU Versus POSIX tar

For filenames or paths longer than 100 characters, GNU uses its own @LongName tar format extension. Some vendors’ tar utilities will choke on the GNU extensions. Here is what Solaris’s archivers say about such an archive:

% pax -r < gnu-archive.tar
pax: ././@LongLink : Unknown filetype
% tar xf gnu-archive.tar
tar: directory checksum error

There definitely appears to be a disadvantage with the distribution of non-POSIX archives. A solution is to use pax to create your tar archives in the POSIX format. I’ll also provide some tips about using pax’s features to compensate for the loss of some parts of GNU tar’s extended feature set.

Replacing tar with pax

The NetBSD and QNX6 pax utility supports a tar interface and can also read the @LongName GNU tar format extension. You can use pax as your tar replacement, since it can read your existing GNU-format archives and can create POSIX archives for future backups. Here’s how to make the quick conversion.

First, replace /usr/bin/tar. That is, rename GNU tar and save it in another directory, in case you ever need to restore GNU tar to its previous location:

# mv /usr/bin/tar /usr/local/bin/gtar

Next, create a symlink from pax to tar. This will allow the pax utility to emulate the tar interface if invoked with the tar name:

# ln -s /bin/pax /usr/bin/tar

Now when you use the tar utility, your archives will really be created by pax.

Compress Archives Without Using Intermediate Files

Let’s say you’re on a system that doesn’t have issues with tar. Why else would you consider using pax as your backup solution?

For one, you can use pax and pipelines to create compressed archives, without using intermediate files. Here’s an example pipeline:

% find /home/kirk -name '*.[ch]' | pax -w | pgp -c

The pipeline’s first stage uses find to generate the exact list of files to archive. When using tar, you will often create the file list using a subshell. Unfortunately, the subshell approach can be unreliable. For example, this user has so much source code that the complete file list does not fit on the command line:

% tar cf kirksrc.tar $(find /home/kirk -name '*.[ch]')
/bin/ksh: tar: Argument list too long

However, in more cases, the pipeline approach will work as expected.

During the second stage, pax reads the list of files from stdin and writes the archive to stdout. The pax found on all of the BSDs has built-in gzip support, so you can also compress the archive during this stage by adding the -z argument.

When creating archives, invoke pax without the -v (verbose) argument. This way, if there are any pax error messages, they won’t get lost in the extra output.

The third stage compresses and/or encrypts the archive. An intermediate tar archive isn’t required as the utility reads its data from the pipeline. This example uses pgp, the Pretty Good Privacy encryption system, which can be found in the ports collection.

Attribute-Preserving Copies

POSIX provides two utilities for copying file hierarchies: cp -R and pax -rw. For regular users, cp -R is the common method. But for administrative use, pax -rw preserves more of the original file attributes, including hard-link counts and file access times. pax -rw also gives you a better copy of the original file hierarchy.

For an example, let’s back up three executables. Note that egrep, fgrep, and grep are all hard links to the same executable.The link count is three, and all have the same inode number. ls -li displays the inode number in column 1 and the link count in column 3:

# ls -il /usr/bin/egrep /usr/bin/fgrep /usr/bin/grep
31888 -r-xr-xr-x  3 root  wheel  73784 Sep  8  2002 /usr/bin/egrep
31888 -r-xr-xr-x  3 root  wheel  73784 Sep  8  2002 /usr/bin/fgrep
31888 -r-xr-xr-x  3 root  wheel  73784 Sep  8  2002 /usr/bin/grep

With pax -rw, we will create one executable with the same date as the original:

# pax -rw /usr/bin/egrep /usr/bin/fgrep /usr/bin/grep /tmp/
# ls -il /tmp/usr/bin/
47 -r-xr-xr-x  3 root  wheel  73784 Sep  8  2002 egrep
47 -r-xr-xr-x  3 root  wheel  73784 Sep  8  2002 fgrep
47 -r-xr-xr-x  3 root  wheel  73784 Sep  8  2002 grep

Can we do the same thing using cp -R? Nope. Instead, we create three new files, each with a unique inode number, a link count of one, and a new date:

# rm /tmp/usr/bin/*
# cp -R /usr/bin/egrep /usr/bin/fgrep /usr/bin/grep /tmp/usr/bin/
# ls -il /tmp/usr/bin/
49 -r-xr-xr-x  1 root  wheel  73784 Dec 19 11:26 egrep
48 -r-xr-xr-x  1 root  wheel  73784 Dec 19 11:26 fgrep
47 -r-xr-xr-x  1 root  wheel  73784 Dec 19 11:26 grep

Rooted Archives and the Substitution Argument

If you have ever used GNU tar and received this message:

tar: Removing leading `/' from absolute path names in the archive

then you were using a tar archive that was rooted, where the files all had absolute paths starting with the forward slash (/). It is not a good idea to clobber existing files unintentionally with foreign binaries, which is why the GNU tar utility automatically strips the leading / for you.

To be safe, you want your unarchiver to create files relative to your current working directory. Rooted archives try to violate this rule by creating files relative to the root of the filesystem, ignoring the current working directory. If that archive contained /etc/passwd, unarchiving it could replace your current password file with a foreign copy. You may be surprised when you cannot log into your system anymore!

You can use the pax substitution argument to remove the leading /. This will ensure that the unarchived files will be created relative to your current working directory, instead of at the root of your filesystem:

# pax -A -r -s '-^/--' < rootedarchive.tar

Here, the -A argument requests that pax not strip the leading / automatically, as we want to do this ourselves. This argument is required only to avoid a bug in the NetBSD pax implementation that interferes with the -s argument. We also want pax to unarchive the file, so we pass the -r argument.

The -s argument specifies an ed-style substitution expression to be performed on the destination pathname. In this example, the leading / will be stripped from the destination paths. See man ed for more information.

If we used the traditional / delimiter, the substitution expression would be /^\///. (The second / isn’t a delimiter, so it has to be escaped with a \.) You will find that / is the worst delimiter, because you have to escape all the slashes found in the paths. Fortunately, you can choose another delimiter. Pick one that isn’t present in the paths, to minimize the number of escape characters you have to add. In the example, we used the - character as the delimiter, and therefore no escapes were required.

The substitution argument can be used to rename files for a beta software release, for example. Say you develop X11R6 software and have multiple development versions on your box:

/usr/X11R6.saturday
/usr/X11R6.working
/usr/X11R6.notworking
/usr/X11R6.released

and you want to install the /usr/X11R6.working directory as usr/X11R6 on the beta system:

# pax -A -w -s '-^/usr/X11R6.working-usr/X11R6-' /usr/X11R6.working \ 
  > /tmp/beta.tar

This time, the -s argument specifies a substitution expression that will replace the beginning of the path /usr/X11R6.working with usr/X11R6 in the archive.

Useful Resources for Multiple Volume Archives

POSIX does not specify the format of multivolume archive headers, meaning that every archiver may use a different intervolume header format. If you have a lot of multivolume tar archives and plan to switch to a different tar implementation, you should test whether you can still recover your old multivolume archives.

This practice may have been more common when Minix/QNX4 users archived their 20 MB hard disks to a stack of floppy disks. Minix/QNX4 users had the vol utility to handle multiple volumes; instead of adding the multivolume functionality to the archiver itself, it was handled by a separate utility. You should be able to switch archiver implementations transparently because vol did the splitting, not the archiver.

The vol utility performs the following operations:

At the end-of-media, prompts for the next volume
Verifies the ordering of the volumes
Concatenates the multiple volumes

Unfortunately, the vol utility isn’t part of the NetBSD package collection. If you create a lot of multivolume archives, you may want to look into porting one of the following utilities:

vol: Creates volume headers for tar; developed by Brian Yost and available at http://groups.google.com/groups?selm=80%40mirror.UUCP&output=gplain
multivol: Provides multiple volume support; created by Marc Schaefer and available at http://www.ibiblio.org/pub/Linux/system/backup/multivol-2.1.tar.bz2

Interactive Copy

When cp alone doesn’t quite meet your copy needs.

The cp command is easy to use, but it does have its limitations. For example, have you ever needed to copy a batch of files with the same name? If you’re not careful, they’ll happily overwrite each other.

Finding Your Source Files

I recently had the urge to find all of the scripts on my system that created a menu. I knew that several ports used scripts named configure and that some of those scripts used dialog to provide a menu selection.

It was easy enough to find those scripts using find:

% find /usr/ports -name configure -exec grep -l "dialog" /dev/null {  } \;
/usr/ports/audio/mbrolavox/scripts/configure
/usr/ports/devel/kdesdk3/work/kdesdk-3.2.0/configure
/usr/ports/emulators/vmware2/scripts/configure
(snip)

This command asks find to start in /usr/ports, looking for files -named configure. For each found file, it should search for the word dialog using -exec grep. The -l flag tells grep to list only the names of the matching files, without including the lines that match the expression. You may recognize the /dev/null { } \; from [Hack #13] .

Normally, I could tell cp to use those found files as the source and to copy them to the specified destination. This is done by enclosing the find command within a set of backticks (`), located at the far top left of your keyboard. Note what happens, though:

% mkdir ~/scripts
% cd ~/scripts
% cp `find /usr/ports -name configure -exec grep -l "dialog" 
               \ 
               /dev/null {  } \;` .
% ls ~/scripts
configure

Although each file that I copied had a different pathname, the filename itself was configure. Since each copied file overwrote the previous one, I ended up with one remaining file.

Renaming a Batch of Source Files

What’s needed is to rename those source files as they are copied to the destination. One approach is to replace the slash (/) in the original file’s pathname with a different character, resulting in a unique filename that still reflects the source of that file.

As we saw in [Hack #15] , sed is designed to do such replacements. Here’s an approach:

% pwd
/usr/home/dru/scripts
% find /usr/ports -name configure -exec grep -l "dialog" /dev/null {  } \; 
               \ 
               -exec sh -c 'cp {  } `echo {  } | sed s:/:=:g`' \;

% ls
=usr=ports=audio=mbrolavox=scripts=configure
=usr=ports=devel=kdesdk3=work=kdesdk-3.2.0=configure
=usr=ports=emulators=vmware2=scripts=configure
(snip)

This invocation of find starts off the same as my original search. It then adds a second -exec, which passes an argument -c as input to the sh shell. The shell will cp the source files (specified by { }), but only after sed has replaced each slash in the pathname with an equals sign (=). Note that I changed the sed delimiter from the default slash to the colon (:) so I didn’t have to escape my / string. You don’t have to use = as the new character; choose whatever suits your purposes.

awk can also perform this renaming feat. The following command is more or less equivalent to the previous command:

% find /usr/ports -name configure -exec grep -l "dialog" /dev/null {  } \; 
               \
               | awk '{dst=$0;gsub("/","=",dst); print "cp",$0,dst}' | sh

Renaming Files Interactively

Depending upon how many files you plan on copying over and how picky you are about their destination names, you may prefer to do an interactive copy.

Despite its name, cp’s interactive switch (-i) will fail miserably in my scenario:

% cp -i `find /usr/ports -name configure -exec grep -l "dialog" \
               /dev/null {  } \;` .
overwrite ./configure? (y/n [n]) n
not overwritten
overwrite ./configure? (y/n [n])
(snip)

Since each file is still named configure, my only choices are either to overwrite the previous file or to not copy over the new file. However, both cpio and pax are capable of interactive copies. Let’s start with cpio:

% find /usr/ports -name configure -exec grep -l "dialog" /dev/null {  } \; 
               \ 
               | cpio -o > ~/scripts/test.cpio && cpio -ir < ~/scripts/test.cpio

Here I’ve piped my find command to cpio. Normally, I would invoke cpio once in copy-pass mode. Unfortunately, that mode doesn’t support -r, the interactive rename switch. So, I directed cpio to send its output (-o >) to an archive named ~/scripts/test.cpio. Instead of piping that archive, I used && to delay the next cpio operation until the previous one finishes. I then used -ir to perform an interactive copy in that archive so I could type in the name of each destination file.

Here are the results:

cpio: /usr/ports/audio/mbrolavox/scripts/configure: truncating inode number
cpio: /usr/ports/devel/kdesdk3/work/kdesdk-3.2.0/configure: truncating 
inode number
cpio: /usr/ports/emulators/vmware2/scripts/configure: truncating inode number
(snip other archive messages)
5136 blocks
rename /usr/ports/audio/mbrolavox/scripts/configure -> mbrolavox.configure
rename /usr/ports/devel/kdesdk3/work/kdesdk-3.2.0/configure -> 
kdesdk3.configure
rename /usr/ports/emulators/vmware2/scripts/configure -> vmware2.configure
(snip remaining rename operations)
5136 blocks

After creating the archive, cpio showed me the source name so I could rename the destination file. While requiring interaction on my part, it does let me fine-tune exactly what I’d like to call each script. I must admit that my names are much nicer than those containing all of the equals signs.

pax is even more efficient. In the preceding command, the first cpio has to wait until find completes, and the second cpio has to wait until the first cpio finishes. Compare that to this command:

% find /usr/ports -name configure -exec grep -l "dialog" /dev/null {  } \; 
               \
               | pax -rwi .

Here, I can pipe the results of find directly to pax, and pax has very user-friendly switches. In this command, I asked to read and write interactively to the current directory. There’s no temporary archive required, and everything happens at once. Even better, pax starts working on the interaction before find finishes. Here’s what it looks like:

ATTENTION: pax interactive file rename operation.
-rwxr-xr-x Nov 11 07:53 /usr/ports/audio/mbrolavox/scripts/configure
Input new name, or a "." to keep the old name, or a "return" to skip 
this file.
Input > mbrovalox.configure
Processing continues, name changed to: mbrovalox.configure

This repeats for each and every file that matched the find results.

Secure Backups Over a Network

When it comes to backups, Unix systems are extremely flexible. For starters, they come with built-in utilities that are just waiting for an administrator’s imagination to combine their talents into a customized backup solution. Add that to one of Unix’s greatest strengths: its ability to see everything as a file. This means you don’t even need backup hardware. You have the ability to send your backup to a file, to a media, to another server, or to whatever is available.

As with any customized solution, your success depends upon a little forethought. In this scenario, I don’t have any backup hardware, but I do have a network with a 100 Mbps switch and a system with a large hard drive capable of holding backups.

Initial Preparation

On the system with that large hard drive, I have sshd running. (An alternative to consider is the scponly shell; see [Hack #63] ). I’ve also created a user and a group called rembackup:

# pw groupadd rembackup
# pw useradd rembackup -g rembackup -m -s /bin/csh
# passwd rembackup
Changing local password for rembackup
New Password:
Retype New Password:
#

If you’re new to the pw command, the -g switch puts the user in the specified group (which must already exist), the -m switch creates the user’s home directory, and the -s switch sets the default shell. (There’s really no good mnemonic; perhaps no one remembers what, if anything, pw stands for.)

Next, from the system I plan on backing up, I’ll ensure that I can ssh in as the user rembackup. In this scenario, the system with the large hard drive has an IP address of 10.0.0.1:

% ssh -l rembackup 10.0.0.1
The authenticity of host '10.0.0.1 (10.0.0.1)' can't be established.
DSA key fingerprint is e2:75:a7:85:46:04:71:51:db:a8:9e:83:b1:5c:7a:2c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.2.93' (DSA) to the list of known hosts. 
Password:
%
% exit
logout
Connection to 10.0.0.1 closed.

Excellent. Since I can log in as rembackup, it looks like both systems are ready for a test backup.

The Backup

I’ll start by testing my command at a command line. Once I’m happy with the results, I’ll create a backup script to automate the process.

# tar czvf - /usr/home | ssh rembackup@10.0.0.1 "cat > genisis_usr_home.tgz" 
usr/home/
usr/home/dru/
usr/home/dru/.cshrc
usr/home/dru/mail/
usr/home/mail/sent-mail
Password:

This tar command creates (c) a compressed (z) backup to a file (f) while showing the results verbosely (v). The minus character (-) represents the specified file, which in this case is stdout. This allows me to pipe stdout to the ssh command. I’ve provided /usr/home, which contains all of my users’ home directories, as the hierarchy to back up.

The results of that backup are then piped (|) to ssh, which will send that output (via cat) to a compressed file called genisis_usr_home.tgz in the rembackup user’s home directory. Since that directory holds the backups for my network, I chose a filename that indicates the name of the host, genisis, and the contents of the backup itself.

Automating the backup

Now that I can securely back up my users’ home directories, I can create a script. It can start out as simple as this:

# more /root/bin/backup
#!/bin/sh
# script to backup /usr/home to backup server
tar czvf - /usr/home | ssh rembackup@10.0.0.1 "cat > genisis_usr_home.tgz"

However, whenever I run that script, I’ll overwrite the previous backup. If that’s not my intention, I can include the date as part of the backup name:

tar czvf - /usr/home | ssh rembackup@10.0.0.1 "cat > \
    genisis_usr_home.`date +%d.%m.%y`.tgz"

Notice I inserted the date command into the filename using backticks. Now the backup file will include the day, month, and year separated by dots, resulting in a filename like genisis_usr_home.21.12.03.tgz.

Once you’re happy with your results, your script is an excellent candidate for a cron job.

Automate Remote Backups

Make remote backups automatic and effortless.

One day, the IDE controller on my web server died, leaving the files on my hard disk hopelessly corrupted. I faced what I had known in the back of my mind all along: I had not been making regular remote backups of my server, and the local backups were of no use to me now that the drive was corrupted.

The reason for this, of course, is that doing remote backups wasn’t automatic and effortless. Admittedly, this was no one’s fault but my own, but my frustration was sufficient enough that I decided to write a tool that would make automated remote snapshots so easy that I wouldn’t ever have to worry about it again. Enter rsnapshot.

Installing and Configuring rsnapshot

Installation on FreeBSD is a simple matter of:

# cd /usr/ports/sysutils/rsnapshot
# make install

I didn’t include the clean target here, as I’d like to keep the work subdirectory, which includes some useful scripts.

Tip

If you’re not using FreeBSD, see the original HOWTO at the project web site for detailed instructions on installing from source.

The install process neither creates nor installs the config file. This means that there is absolutely no possibility of accidentally overwriting a previously existing config file during an upgrade. Instead, copy the example configuration file and make changes to the copy:

# cp /usr/local/etc/rsnapshot.conf.default /usr/local/etc/rsnapshot.conf

The rsnapshot.conf config file is well commented, and much of it should be fairly self-explanatory. For a full reference of all the various options, please consult man rsnapshot.

rsnapshot uses the /.snapshots/ directory to hold the filesystem snapshots. This is referred to as the snapshot root. This must point to a filesystem where you have lots of free disk space.

Tip

Note that fields are separated by tabs, not spaces. This makes it easier to specify file paths with spaces in them.

Specifying backup intervals

rsnapshot has no idea how often you want to take snapshots. In order to specify how much data to save, you need to tell rsnapshot which intervals to keep, and how many of each.

By default, a snapshot will occur every four hours, or six times a day (these are the hourly intervals). It will also keep a second set of snapshots, taken once a day and stored for a week (or seven days):

interval    hourly  6
interval    daily   7

Note that the hourly interval is specified first. This is very important, as the first interval line is assumed to be the smallest unit of time, with each additional line getting successively bigger. Thus, if you add a yearly interval, it should go at the bottom, and if you add a minutes interval, it should go before the hourly interval. It’s also worth noting that the snapshots are pulled up from the smallest interval to the largest. In this example, the daily snapshots are pulled from the oldest hourly snapshot, not directly from the main filesystem.

The backup section tells rsnapshot which files you actually want to back up:

backup      /etc/      localhost/etc/

In this example, backup is the backup point, /etc/ is the full path to the directory we want to take snapshots of, and localhost/etc/ is a subdirectory inside the snapshot root where the snapshots are stored. If you are taking snapshots of several machines on one dedicated backup server, it’s a good idea to use hostnames as directories to keep track of which files came from which server.

In addition to full paths on the local filesystem, you can also back up remote systems using rsync over ssh. If you have ssh enabled (via the cmd_ssh parameter), specify a path similar to this:

backup      backup@example.com:/etc/     example.com/etc/

This behaves fundamentally the same way as specifying local pathnames, but you must take a few extra things into account:

The ssh daemon must be running on example.com.
You must have access to the specified account on the remote machine (in this case, the backup user on example.com). See [Hack #38] for instructions on setting this up.
You must have key-based logins enabled for the specified user at example.com, without passphrases.
This backup occurs over the network, so it may be slower. Since this uses rsync, this is most noticeable during the first backup. Depending on how much your data changes, subsequent backups should go much faster.

Tip

One thing you can do to mitigate the potential damage from a backup server breach is to create alternate users on the client machines with their UIDs and GIDs set to 0, but with a more restrictive shell, such as scponly [Hack #63] .

Preparing for script automation

With the backup_script parameter, the second column is the full path to an executable backup script, and the third column is the local path in which you want to store it. For example:

backup_script      /usr/local/bin/backup_pgsql.sh     localhost/postgres/

Tip

You can find the backup_pgsql.sh example script in the utils/ directory of the source distribution. Alternatively, if you didn’t include the clean target when you installed the FreeBSD port, the file will be located in /usr/ports/sysutils/rsnapshot/work/rsnapshot-1.0.9/utils.

Your backup script only needs to dump its output into its current working directory. It can create as many files and directories as necessary, but it should not put its files in any predetermined path. This is because rsnapshot creates a temp directory, changes to that directory, runs the backup script, and then syncs the contents of the temp directory to the local path you specified in the third column. A typical backup script might look like this:

#!/bin/sh

/usr/bin/mysqldump -uroot mydatabase > mydatabase.sql
/bin/chown 644 mydatabase.sql

There are a couple of example scripts in the utils/ directory of the rsnapshot source distribution to give you more ideas.

Warning

Remember that backup scripts will be invoked as the user running rsnapshot. Make sure your backup scripts are not writable by anyone else.

Testing your config file

After making your changes, verify that the config file is syntactically valid and that all the supporting programs are where you think they are:

# rsnapshot configtest

If all is well, the output should say Syntax OK. If there’s a problem, it should tell you exactly what it is.

The final step to test your configuration is to run rsnapshot with the -t flag, for test mode. This will print out a verbose list of the things it will do, without actually doing them. For example, to simulate an hourly backup:

# rsnapshot -t hourly

Scheduling rsnapshot

Now that you have your config file set up, it’s time to schedule rsnapshot to run from cron. Add the following lines to root’s crontab:

0 */4 * * *       /usr/local/bin/rsnapshot hourly
30 23 * * *       /usr/local/bin/rsnapshot daily

The Snapshot Storage Scheme

All backups are stored within a configurable snapshot root directory. In the beginning it will be empty. rsnapshot creates subdirectories for the various defined intervals. After a week, the directory should look something like this:

# ls -l /.snapshots/
drwxr-xr-x    7 root     root         4096 Dec 28 00:00 daily.0
drwxr-xr-x    7 root     root         4096 Dec 27 00:00 daily.1
drwxr-xr-x    7 root     root         4096 Dec 26 00:00 daily.2
drwxr-xr-x    7 root     root         4096 Dec 25 00:00 daily.3
drwxr-xr-x    7 root     root         4096 Dec 24 00:00 daily.4
drwxr-xr-x    7 root     root         4096 Dec 23 00:00 daily.5
drwxr-xr-x    7 root     root         4096 Dec 22 00:00 daily.6
drwxr-xr-x    7 root     root         4096 Dec 29 00:00 hourly.0
drwxr-xr-x    7 root     root         4096 Dec 28 20:00 hourly.1
drwxr-xr-x    7 root     root         4096 Dec 28 16:00 hourly.2
drwxr-xr-x    7 root     root         4096 Dec 28 12:00 hourly.3
drwxr-xr-x    7 root     root         4096 Dec 28 08:00 hourly.4
drwxr-xr-x    7 root     root         4096 Dec 28 04:00 hourly.5

Each of these directories contains a full backup of that point in time. The destination directory paths you specified as the backup and backup_script parameters are placed directly under these directories. In the example:

backup          /etc/           localhost/etc/

the /etc/ directory will initially back up into /.snapshots/hourly.0/localhost/etc/.

Each subsequent time rsnapshot is run with the hourly command, it will rotate the hourly.X directories, “copying” the contents of the hourly.0 directory (using hard links) into hourly.1.

When rsnapshot daily runs, it will rotate all the daily.X directories, then copy the contents of hourly.5 into daily.0.

hourly.0 will always contain the most recent snapshot, and daily.6 will always contain a snapshot from a week ago. Unless the files change between snapshots, the full backups are really just multiple hard links to the same files. This is how rsnapshot uses space so efficiently. If the file changes at any point, the next backup will unlink the hard link in hourly.0, replacing it with a brand new file. This will now use twice the disk space it did before, but it is still considerably less space than 13 full, unique copies would occupy.

Remember, if you are using different intervals than the ones in this example, the first interval listed is the one that gets updates directly from the main filesystem. All subsequently listed intervals pull from the previous snapshots.

Accessing Snapshots

When rsnapshot first runs, it will create the configured snapshot_root directory. It assigns this directory the permissions 0700 since the snapshots will probably contain files owned by all sorts of users on your system.

The simplest but least flexible solution is to disallow access to the snapshot root altogether. The root user will still have access, of course, and will be the only one who can pull backups. This may or may not be desirable, depending on your situation. For a small setup, this may be sufficient.

If users need to be able to pull their own backups, you will need to do a little extra work up front. The best option seems to be creating a container directory for the snapshot root with 0700 permissions, giving the snapshot root directory 0755 permissions, and mounting the snapshot root for the users as read-only using NFS or Samba.

Let’s explore how to do this using NFS on a single machine. First, set the snapshot_root variable in rsnapshot.conf:

snapshot_root       /usr/.private/.snapshots/

Then, create the container directory, the real snapshot root, and a read-only mount point:

# mkdir /usr/.private/
# mkdir /usr/.private/.snapshots/
# mkdir /.snapshots/

Set the proper permissions on these new directories:

# chmod 0700 /usr/.private/
# chmod 0755 /usr/.private/.snapshots/
# chmod 0755 /.snapshots/

In /etc/exports, add /usr/.private/.snapshots/ as a read-only NFS export:

/usr/.private/.snapshots/  127.0.0.1(ro)

Tip

If your version of NFS supports it, include the no_root_squash option. (Place it within the brackets after ro with a comma—not a space—as the separator.) This option allows the root user to see all the files within the read-only export.

In /etc/fstab, mount /usr/.private/.snapshots/ read-only under /.snapshots/:

localhost:/usr/.private/.snapshots/   /.snapshots/   nfs    ro   0 0

Restart your NFS daemon and mount the read-only snapshot root:

# /etc/rc.d/nfsd restart
# mount /.snapshots/

To test this, try adding a file as the superuser:

# touch /.snapshots/testfile

This should fail with insufficient permissions. This is what you want. It means that your users won’t be able to mess with the snapshots either.

Users who wish to recover old files can go into the /.snapshots directory, select the interval they want, and browse through the filesystem until they find the files they are looking for. NFS will prevent them from making modifications, but they can copy anything that they had permission to read in the first place.

Automate Data Dumps for PostgreSQL Databases

Building your own backup utility doesn’t have to be scary.

PostgreSQL is a robust, open source database server. Like most database servers, it provides utilities for creating backups. PostgreSQL’s primary tools for creating backup files are pg_dump and pg_dumpall. However, if you want to automate your database backup processes, these tools have a few limitations:

pg_dump dumps only one database at a time.
pg_dumpall dumps all of the databases into a single file.
pg_dump and pg_dumpall know nothing about multiple backups.

These aren’t criticisms of the backup tools—just an observation that customization will require a little scripting. Our resulting script will backup multiple systems, each to their own backup file.

Creating the Script

This script uses Python and its ability to execute other programs to implement the following backup algorithm:

Change the working directory to a specified database backup directory.
Rename all backup files ending in .gz so that they end in .gz.old. Existing files ending in .gz.old will be overwritten.
Clean up and analyze all PostgreSQL databases using its vacuumdb command.
Get a current list of databases from the PostgreSQL server.
Dump each database, piping the results through gzip, into its own compressed file.

Why Python? My choice is one of personal preference; this task is achievable in just about any scripting language. However, Python is cross-platform and easy to learn, and its scripts are easy to read.

The Code

#!/usr/local/bin/python

# /usr/local/bin/pg2gz.py

# This script lists all PostgreSQL
# databases and pipes them separately
# through gzip into .gz files.

# INSTRUCTIONS
# 1.  Review and edit line 1 to reflect the location
#     of your python command file.
# 2.  Redefine the save_dir variable (on line 22) to
#     your backup directory.
# 3.  To automate the backup process fully, consider
#     scheduling the regular execution of this script
#     using cron.

import os, string

# Redefine this variable to your backup directory.
# Be sure to include the slash at the end.
save_dir = '/mnt/backup/databases/'

# Rename all *.gz backup files to *.gz.old.
curr_files = os.listdir(save_dir)
for n in curr_files:
        if n[len(n)-2:] =  = 'gz':
                os.popen('mv ' + save_dir + n + " " + save_dir + n + '.old')
        else:
                pass

# Vacuum all databases
os.popen('vacuumdb -a -f -z')

# 'psql -l' produces a list of PostgreSQL databases.
get_list = os.popen('psql -l').readlines( )

# Exclude header and footer lines.
db_list = get_list[3:-2]

# Extract database names from first element of each row.
for n in db_list:
        n_row = string.split(n)
        n_db = n_row[0]

        # Pipe database dump through gzip
        # into .gz files for all databases
        # except template*.
        if n_db =  = 'template0':
                pass
        elif n_db =  = 'template1':
                pass
        else:
                os.popen('pg_dump ' + n_db + ' | gzip -c > ' + save_dir + 
                          n_db + '.gz')

Running the Hack

The script assumes that you have a working installation of PostgreSQL. You’ll also need to install Python, which is available through the ports collection or as a binary package. The Python modules used are installed by default.

Double-check the location of your Python executable using:

% which python
/usr/local/bin/python

and ensure the first line of the script reflects your location. Don’t forget to make the script executable using chmod +x.

On line 22 of the script, redefine the sav_dir variable to reflect the location of your backup directory. As is, the script assumes a backup directory of /mnt/backup/databases/.

You’ll probably want to add the script to the pgsql user’s crontab for periodic execution. To schedule the script for execution, log in as pgsql or, as the superuser, su to pgsql. Once you’re acting as pgsql, execute:

% crontab -e

to open the crontab file in the default editor.

Given the following crontab file, /usr/local/bin/pg2gz.py will execute at 4 AM every Sunday.

# more /var/cron/tabs/pgsql
SHELL=/bin/sh
PATH=/var/cron/tabs:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin

#minute    hour    mday    month    wday     command
0          4       *       *        0        /usr/local/bin/pg2gz.py

Perform Client-Server Cross-Platform Backups with Bacula

Don’t let the campy name fool you. Bacula is a powerful, flexible, open source backup program. .

Having problems finding a backup solution that fits all your needs? One that can back up both Unix and Windows systems? That is flexible enough to back up systems with irregular backup needs, such as laptops? That allows you to run scripts before or after the backup job? That provides browsing capabilities so you can decide upon a restore point? Bacula may be what you’re looking for.

Introducing Bacula

Bacula is a client-server solution composed of several distinct parts:

Director: The Director is the most complex part of the system. It keeps track of all clients and files to be backed up. This daemon talks to the clients and to the storage devices.
Client/File Daemon: The Client (or File) Daemon runs on each computer which will be backed up by the Director. Some other backup solutions refer to this as the Agent.
Storage Daemon: The Storage Daemon communicates with the backup device, which may be tape or disk.
Console: The Console is the primary interface between you and the Director. I use the command-line Console, but there is also a GNOME GUI Console.

Each File Daemon will have an entry in the Director configuration file. Other important entries include FileSets and Jobs. A FileSet identifies a set of files to back up. A Job specifies a single FileSet, the type of backup (incremental, full, etc.), when to do the backup, and what Storage Device to use. Backup and restore jobs can be run automatically or manually.

Installation

Bacula stores details of each backup in a database. You can use either SQLite or MySQL, and starting with Bacula Version 1.33, PostgreSQL. Before you install Bacula, decide which database you want to use.

Warning

FreeBSD 4.x (prior to 4.10-RELEASE) and FreeBSD 5.x (Version 5.2.1 and earlier) have a pthreads bug that could cause you to lose data. Refer to platform/freebsd/pthreads-fix.txt in your Bacula source directory for full details.

The existing Bacula documentation provides detailed installation instructions if you’re installing from source. To install instead the SQLite version of the FreeBSD port:

# cd /usr/ports/sysutils/bacula
# make install

Or, if you prefer to install the MySQL version:

# cd /usr/ports/sysutils/bacula
# make -DWITH_MYSQL install

Tip

Don’t use the clean target with your make command, because there are some scripts in the work directory you’ll need to use.

Configuration Files

Bacula installs several configuration files that should work for your environment with few modifications.

File Daemon on the backup client

The first configuration file, /usr/local/etc/bacula-fd.conf, is for the File Daemon. This file needs to reside on each machine you want to back up. For security reasons, only the Directors specified in this file will be able to communicate with this File Daemon. The name and password specified in the Director resource must be supplied by any connecting Director.

You can specify more than one Director { } resource. Make sure the password matches the one in the Client resource in the Director’s configuration file.

The FileDaemon { } resource identifies this system and specifies the port on which it will listen for Directors. You may have to create a directory manually to match the one specified by the Working Directory.

Storage Daemon on the backup server

The next configuration file, /usr/local/etc/bacula-sd.conf, is for the Storage Daemon. The default values should work unless you need to specify additional storage devices.

As with the File Daemon, the Director { } resource specifies the Director(s) that may contact this Storage Daemon. The password must match that found in the Storage resource in the Director’s configuration file.

Director on the backup server

The Director’s configuration is by necessity the largest of the daemons. Each Client, Job, FileSet, and Storage Device is defined in this file.

In the following example configuration, I’ve defined the Job Client1 to back up the files defined by the FileSet Full Set on a laptop. The backup will be performed to the File storage device, which is really a disk located at laptop.example.org.

Tip

This isn’t an optimal solution for a real backup, as I’m just backing up files from the laptop to somewhere else on the laptop. It is sufficient for demonstration and testing, though.

# more /usr/local/etc/bacula-dir.conf

  Director {
    Name                    = laptop-dir
    DIRport                 = 9101
    QueryFile               = "/usr/local/etc/query.sql"
    WorkingDirectory        = "/var/db/bacula"
    PidDirectory            = "/var/run"
    Maximum Concurrent Jobs = 1
    Password                = "lLftflC4QtgZnWEB6vAGcOuSL3T6n+P7jeH+HtQOCWwV"
    Messages                = Standard
  }
   Job {
    Name            = "Client1"
    Type            = Backup
    Client          = laptop-fd
    FileSet         = "Full Set"
    Schedule        = "WeeklyCycle"
    Storage         = File
    Messages        = Standard
    Pool            = Default
    Write Bootstrap = "/var/db/bacula/Client1.bsr"
    Priority        = 10
  }
  FileSet {
    Name = "Full Set"
    Include = signature=MD5 {
      /usr/ports/sysutils/bacula/work/bacula-1.32c
    }

  # If you backup the root directory, the following two excluded
  #   files can be useful
  #
    Exclude = { /proc /tmp /.journal /.fsck }
  }
  Client {
    Name           = laptop-fd
    Address        = laptop.example.org
    FDPort         = 9102
    Catalog        = MyCatalog
    Password       = "laptop-client-password"
    File Retention = 30 days
    Job Retention  = 6 months
    AutoPrune      = yes
  }
  # Definition of file storage device
  Storage {
    Name       = File
    Address    = laptop.example.org
    SDPort     = 9103
    Password   = "TlDGBjTWkjTS/0HNMPF8ROacI3KlgIUZllY6NS7+gyUp"
    Device     = FileStorage
    Media Type = File
  }

Note that the password given by any connecting Console must match the one here.

Database Setup

Now that you’ve modified the configuration files to suit your needs, use Bacula’s scripts to create and define the database tables that it will use.

To set up for MySQL:

# cd /usr/ports/sysutils/bacula/work/bacula-1.32c/src/cats
# ./grant_mysql_privileges
# ./create_mysql_database
# ./make_mysql_tables

If you have a password set for the MySQL root account, add -p to these commands and you will be prompted for the password. You now have a working database suitable for use by Bacula.

Testing Your Tape Drive

Some tape drives are not standard. They require their own proprietary software and can be temperamental when used with other software. Regardless of what software it uses, each drive model can have its own little quirks that need to be catered to. Fortunately, Bacula comes with btape, a handy little utility for testing your drive.

My tape drive is at /dev/sa1. Bacula prefers to use the non-rewind variant of the device, but it can handle the raw variant as well. If you use the rewinding device, then only one backup job per tape is possible. This command will test the non-rewind device /dev/nrsa1:

# /usr/local/sbin/btape -c /usr/local/etc/bacula-sd.conf /dev/nrsa1

Running Without Root

It is a good idea to run daemons with the lowest possible privileges. The Storage Daemon and the Director Daemon do not need root permissions. However, the File Daemon does, because it needs to access all files on your system.

In order to run daemons with nonroot accounts, you need to create a user and a group. Here, I used vipw to create the user. I selected a user ID and group ID of 1002, as they were unused on my system.

bacula:*:1002:1002::0:0:Bacula Daemon:/var/db/bacula:/sbin/nologin

I also added this line to /etc/group:

bacula:*:1002:

The bacula user (as opposed to the Bacula daemon) will have a home directory of /var/db/bacula, which is the default location for the Bacula database.

Now that you have both a bacula user and a bacula group, you can secure the bacula home directory by issuing this command:

# chown -R bacula:bacula /var/db/bacula/

Starting the Bacula Daemons

To start the Bacula daemons on a FreeBSD system, issue the following command:

# /usr/local/etc/rc.d/bacula.sh start

To confirm they are all running:

# ps auwx | grep bacula

root 63416 0.0 0.3 2040 1172 ?? Ss 4:09PM 0:00.01
    /usr/local/sbin/bacula-sd -v -c /usr/local/etc/bacula-sd.conf
root 63418 0.0 0.3 1856 1036 ?? Ss 4:09PM 0:00.00
    /usr/local/sbin/bacula-fd -v -c /usr/local/etc/bacula-fd.conf
root 63422 0.0 0.4 2360 1440 ?? Ss 4:09PM 0:00.00
    /usr/local/sbin/bacula-dir -v -c /usr/local/etc/bacula-dir.conf

Using the Bacula Console

The console is the main interface through which you run jobs, query system status, and examine the Catalog contents, as well as label, mount, and unmount tapes. There are two consoles available: one runs from the command line, and the other is a GNOME GUI. I will concentrate on the command-line console.

To start the console, I use this command:

#  /usr/local/sbin/console -c /usr/local/etc/console.conf
Connecting to Director laptop:9101
1000 OK: laptop-dir Version: 1.32c (30 Oct 2003)
*

You can obtain a list of the available commands with the help command. The status all command is a quick and easy way to verify that all components are up and running. To label a Volume, use the label command.

Bacula comes with a preset backup job to get you started. It will back up the directory from which Bacula was installed. Once you get going and have created your own jobs, you can safely remove this job from the Director configuration file.

Not surprisingly, you use the run command to run a job. Once the job runs, the results will be sent to you via email, according to the Messages resource settings within your Director configuration file.

To restore a job, use the restore command. You should choose the restore location carefully and ensure there is sufficient disk space available.

It is easy to verify that the restored files match the original:

# diff -ruN \
               /tmp/bacula-restores/usr/ports/sysutils/bacula/work/bacula-1.32c \
               /usr/ports/sysutils/bacula/work/bacula-1.32c
#

Creating Backup Schedules

For my testing, I wanted to back up files on my Windows XP machine every hour. I created this schedule:

Schedule {
  Name = "HourlyCycle"
  Run  = Full 1st sun at 1:05
  Run  = Differential 2nd-5th sun at 1:05
  Run  = Incremental Hourly
}

Any Job that uses this schedule will be run at the following times:

A full backup will be done on the first Sunday of every month at 1:05 AM.
A differential backup will be run on the 2nd, 3rd, 4th, and 5th Sundays of every month at 1:05 AM.
Every hour, on the hour, an incremental backup will be done.

Creating a Client-only Install

So far we have been testing Bacula on the server. With the FreeBSD port, installing a client-only version of Bacula is easy:

# cd /usr/ports/sysutils/bacula
# make -DWITH_CLIENT_ONLY install

You will also need to tell the Director about this client by adding a new Client resource to the Director configuration file. You will also want to create a Job and FileSet resource.

When you change the Bacula configuration files, remember to restart the daemons:

# /usr/local/etc/rc.d/bacula.sh restart
Stopping the Storage daemon
Stopping the File daemon
Stopping the Director daemon
Starting the Storage daemon
Starting the File daemon
Starting the Director daemon
#

Chapter 4. Backing Up

Introduction

Back Up FreeBSD with SMBFS

Adding NETSMB Kernel Support

Establishing an SMB Connection with a Host System

Tip

Mounting a Share

Archiving and Compressing Data with tar and gzip

Tip

Hacking the Hack

See Also

Create Portable POSIX Archives

GNU Versus POSIX tar

Replacing tar with pax

Compress Archives Without Using Intermediate Files

Attribute-Preserving Copies

Rooted Archives and the Substitution Argument

Useful Resources for Multiple Volume Archives

See Also

Interactive Copy

Finding Your Source Files

Renaming a Batch of Source Files

Renaming Files Interactively

See Also

Secure Backups Over a Network

Initial Preparation

The Backup

Automating the backup

See Also

Automate Remote Backups

Installing and Configuring rsnapshot

Tip

Tip

Specifying backup intervals

Tip

Preparing for script automation

Tip

Warning

Testing your config file

Scheduling rsnapshot

The Snapshot Storage Scheme

Accessing Snapshots

Tip

See Also

Automate Data Dumps for PostgreSQL Databases

Creating the Script

The Code

Running the Hack

See Also

Perform Client-Server Cross-Platform Backups with Bacula

Introducing Bacula

Installation

Warning

Tip

Configuration Files

File Daemon on the backup client

Storage Daemon on the backup server

Director on the backup server

Tip

Database Setup

Testing Your Tape Drive

Running Without Root

Starting the Bacula Daemons

Using the Bacula Console

Creating Backup Schedules

Creating a Client-only Install

See Also

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly