I began gathering contributions for this book, it soon become obvious that there would be an entire chapter on backups. Not only do BSD users follow the mantra “backup, backup, backup,” but every admin seems to have hacked his own solution to take advantage of the tools at hand and the environment that needs to be backed up.
If you’re looking for tutorials on how to use dump
and tar
, you won’t find them here. However, you
will find nonobvious uses for their less well-known counterparts
pax
and cpio
. I’ve also included a hack on backing up
over ssh
, to introduce the novice
user to the art of combining tools over a secure network
connection.
You’ll also find scripts that fellow users have created to get the most out of their favorite backup utility. Finally, there are hacks that introduce some very useful open source third-party utilities.
A good backup can save the day when things go wrong. A bad—or missing—backup can ruin the whole week.
Regular backups are vital to good administration. You can perform backups with hardware as basic as a SCSI tape drive using 8mm tape cartridges or as advanced as an AIT tape library system using cartridges that can store up to 50 GB of compressed data. But what if you don’t have the luxury of dedicated hardware for each server?
Since most networks are comprised of multiple systems, you can
archive data from one server across the network to another. We’ll back
up a FreeBSD system using the tar
and
gzip
archiving utilities and the smbutil
and mount_smbfs
commands to transport that data to
network shares. These procedures were tested on FreeBSD 4.6-STABLE and
5.1-RELEASE.
Since SMB is a network-aware filesystem, we need to build SMB
support into the kernel. This means adding the proper options
lines to the custom kernel
configuration file. For information on building a custom kernel, see
[Hack
#54] , the Building and Installing a Custom Kernel
section (9.3) of the FreeBSD Handbook, and relevant information
contained in /usr/src/sys/i386/conf.
Add the following options under the makeoptions
section:
options NETSMB # SMB/CIFS requester options NETSMBCRYPTO # encrypted password support for SMB options LIBMCHAIN # mbuf management library options LIBICONV options SMBFS
Once you’ve saved your changes, use the make buildkernel
and make
installkernel
commands to build and install
the new kernel.
The next step is to decide which system on the network to
connect to. Obviously, the destination server needs to have an active
share on the network, as well as enough disk space available to hold
your archives. It will also need a valid user account with which you
can log in. You’ll probably also want to choose a system that’s backed
up regularly to removable media. I’ll use a machine named smbserver1
.
Tip
The smbutil
and mount_smbfs
commands both come standard
with the base install of FreeBSD. Their only requirements are the
five kernel options listed in the preceding section.
Once you have chosen the proper host, make an SMB connection
manually with the smbutil login
command. This connection will remain active, allowing you to interact
with the SMB server, until you issue the smbutil
logout
command. So, to log in:
# smbutil login //jwarner@smbserver1
Password:
Connected to smbserver1
And to log out:
# smbutil logout //jwarner@smbserver1
Password:
Connection unmarked as permanent and will
be closed when possible
Once you’re sure you can manually initiate a connection with the host system, create a mount point where you can mount the remote share. I’ll create a mount point directory called /backup:
# mkdir /backup
Next, reestablish a connection with the host system and mount its share:
#smbutil login //jwarner@smbserver1
Password: Connected to smbserver1 #mount_smbfs -N //jwarner@smbserver1/sharename /backup
Note that I used the -N
switch to mount_smbfs
to avoid
having to supply a password a second time. If you prefer to be
prompted for a password when mounting the share, simply omit the
-N
switch.
After connecting to the host server and mounting its network share, the next step is to back up and copy the necessary files. You can get as complicated as you like, but I’ll create a simple shell script, bkup, inside the mounted share that compresses important files and directories.
This script will make compressed archives of the /boot, /etc, /home, and /usr/local/etc directories. Add to or edit
this list as you see fit. At a minimum, I recommend including the
/etc and /usr/local/etc directories, as they contain
important configuration files. See man
hier
for a complete description of the FreeBSD directory
structure.
#!/bin/sh # script that backs up the following four directories: tar cvvpzf boot.tar.gz /boot tar cvvpzf etc.tar.gz /etc tar cvvpzf home.tar.gz /home tar cvvpzf usr_local_etc.tar.gz /usr/local/etc
Tip
This script is an example to get you started. There are many
ways to use tar
. Read man 1 tar
carefully, and tailor the script
to suit your needs.
Be sure to make this file executable:
# chmod 755 bkup
Run the script to create the archives:
# ./bkup
tar: Removing leading / from absolute path names in the archive.
drwxr-xr-x root/wheel 0 Jun 23 18:19 2002 boot/
drwxr-xr-x root/wheel 0 May 11 19:46 2002 boot/defaults/
-r--r--r-- root/wheel 10957 May 11 19:46 2002 boot/defaults/loader.conf
-r--r--r-- root/wheel 512 Jun 23 18:19 2002 boot/mbr
(snip)
After the script finishes running, you’ll have *.tar.gz files of the directories you chose to archive:
# ls | more
bkup
boot.tar.gz
etc.tar.gz
home.tar.gz
usr_local_etc.tar.gz
Once you’ve tested your shell script manually and are happy with
your results, add it to the cron
scheduler to run on scheduled days and times.
Remember, how you choose to implement your backups isn’t important—backing up regularly is. Facing the problem of deleted or corrupted data isn’t a matter of “if” but rather a matter of “when.” This is why good backups are essential.
Things to consider when modifying the script to suit your own purposes:
Add entries to automatically mount and unmount the share (see [Hack #68] for an example).
Use your backup utility of choice. You’re not limited to just
tar
!
man 1 smbutil
man 8 mount_smbfs
man 7 hier
man 1 tar
man 1 gzip
The Building and Installing a Custom Kernel section of the FreeBSD Handbook (http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-building.html)
Create portable tar archives with pax.
Some POSIX operating systems ship with GNU tar
as the default tar
utility (NetBSD and QNX6, for example).
This is problematic because the GNU tar
format is not compatible with other
vendors’ tar
implementations. GNU is
an acronym for “GNU’s not UNIX”—in this case, GNU’s not POSIX
either.
For filenames or paths longer than 100 characters, GNU uses
its own @LongName
tar
format extension. Some vendors’
tar
utilities will choke on the GNU
extensions. Here is what Solaris’s archivers say about such an
archive:
%pax -r < gnu-archive.tar
pax: ././@LongLink : Unknown filetype %tar xf gnu-archive.tar
tar: directory checksum error
There definitely appears to be a disadvantage with the
distribution of non-POSIX archives. A solution is to use pax
to create your tar
archives in the POSIX format. I’ll also
provide some tips about using pax
’s
features to compensate for the loss of some parts of GNU tar
’s extended feature set.
The NetBSD and QNX6 pax
utility supports a tar
interface and
can also read the @LongName
GNU
tar
format extension. You can use
pax
as your tar
replacement, since it can read your
existing GNU-format archives and can create POSIX archives for future
backups. Here’s how to make the quick conversion.
First, replace /usr/bin/tar. That is, rename GNU tar
and save it in another directory, in
case you ever need to restore GNU tar
to its previous location:
# mv /usr/bin/tar /usr/local/bin/gtar
Next, create a symlink from pax
to tar
. This will allow the pax
utility to emulate the tar
interface if invoked with the tar
name:
# ln -s /bin/pax /usr/bin/tar
Now when you use the tar
utility, your archives will really be created by pax
.
Let’s say you’re on a system that doesn’t have issues with
tar
. Why else would you consider
using pax
as your backup
solution?
For one, you can use pax
and
pipelines to create compressed archives, without
using intermediate files. Here’s an example pipeline:
% find /home/kirk -name '*.[ch]' | pax -w | pgp -c
The pipeline’s first stage uses find
to generate the exact list of files to
archive. When using tar
, you will
often create the file list using a subshell. Unfortunately, the
subshell approach can be unreliable. For example, this user has so
much source code that the complete file list does not fit on the
command line:
% tar cf kirksrc.tar $(find /home/kirk -name '*.[ch]')
/bin/ksh: tar: Argument list too long
However, in more cases, the pipeline approach will work as expected.
During the second stage, pax
reads the list of files from stdin and writes the archive to stdout.
The pax
found on all of the BSDs
has built-in gzip
support, so you
can also compress the archive during this stage by adding the -z
argument.
When creating archives, invoke pax
without the -v
(verbose) argument. This way, if there
are any pax
error messages, they
won’t get lost in the extra output.
The third stage compresses and/or encrypts the archive. An
intermediate tar
archive isn’t
required as the utility reads its data from the pipeline. This example
uses pgp
, the Pretty Good Privacy
encryption system, which can be found in the ports collection.
POSIX provides two utilities for copying file hierarchies: cp -R
and pax -rw
. For regular users,
cp -R
is the common method. But for
administrative use, pax -rw
preserves more of the original file attributes, including hard-link
counts and file access times. pax
-rw
also gives you a better copy of the original file
hierarchy.
For an example, let’s back up three executables. Note that
egrep
, fgrep
, and grep
are all hard links to the same
executable.The link count is three, and all have the same inode
number. ls -li
displays the inode
number in column 1 and the link count in column 3:
# ls -il /usr/bin/egrep /usr/bin/fgrep /usr/bin/grep
31888 -r-xr-xr-x 3 root wheel 73784 Sep 8 2002 /usr/bin/egrep
31888 -r-xr-xr-x 3 root wheel 73784 Sep 8 2002 /usr/bin/fgrep
31888 -r-xr-xr-x 3 root wheel 73784 Sep 8 2002 /usr/bin/grep
With pax -rw
, we will create
one executable with the same date as the original:
#pax -rw /usr/bin/egrep /usr/bin/fgrep /usr/bin/grep /tmp/
#ls -il /tmp/usr/bin/
47 -r-xr-xr-x 3 root wheel 73784 Sep 8 2002 egrep 47 -r-xr-xr-x 3 root wheel 73784 Sep 8 2002 fgrep 47 -r-xr-xr-x 3 root wheel 73784 Sep 8 2002 grep
Can we do the same thing using cp
-R
? Nope. Instead, we create three new files, each with a
unique inode number, a link count of one, and a new date:
#rm /tmp/usr/bin/*
#cp -R /usr/bin/egrep /usr/bin/fgrep /usr/bin/grep /tmp/usr/bin/
#ls -il /tmp/usr/bin/
49 -r-xr-xr-x 1 root wheel 73784 Dec 19 11:26 egrep 48 -r-xr-xr-x 1 root wheel 73784 Dec 19 11:26 fgrep 47 -r-xr-xr-x 1 root wheel 73784 Dec 19 11:26 grep
If you have ever used GNU tar
and received this message:
tar: Removing leading `/' from absolute path names in the archive
then you were using a tar
archive that was rooted, where the files all had absolute paths
starting with the forward slash (/
). It is not a good idea to clobber
existing files unintentionally with foreign binaries, which is why the
GNU tar
utility automatically
strips the leading /
for
you.
To be safe, you want your unarchiver to create files relative to your current working directory. Rooted archives try to violate this rule by creating files relative to the root of the filesystem, ignoring the current working directory. If that archive contained /etc/passwd, unarchiving it could replace your current password file with a foreign copy. You may be surprised when you cannot log into your system anymore!
You can use the pax
substitution argument to remove the leading /
. This will ensure that the unarchived
files will be created relative to your current working directory,
instead of at the root of your filesystem:
# pax -A -r -s '-^/--' < rootedarchive.tar
Here, the -A
argument
requests that pax
not strip the
leading /
automatically, as we want
to do this ourselves. This argument is required only to avoid a bug in
the NetBSD pax
implementation that
interferes with the -s
argument. We
also want pax
to unarchive the
file, so we pass the -r
argument.
The -s
argument specifies an
ed
-style substitution expression to
be performed on the destination pathname. In this example, the leading
/
will be stripped from the
destination paths. See man ed
for
more information.
If we used the traditional /
delimiter, the substitution expression would be /^\///
. (The second /
isn’t a delimiter, so it has to be escaped
with a \
.) You will find that
/
is the worst delimiter, because
you have to escape all the slashes found in the paths. Fortunately,
you can choose another delimiter. Pick one that isn’t present in the
paths, to minimize the number of escape characters you have to add. In
the example, we used the -
character as the delimiter, and therefore no escapes were
required.
The substitution argument can be used to rename files for a beta software release, for example. Say you develop X11R6 software and have multiple development versions on your box:
/usr/X11R6.saturday /usr/X11R6.working /usr/X11R6.notworking /usr/X11R6.released
and you want to install the /usr/X11R6.working directory as usr/X11R6 on the beta system:
#pax -A -w -s '-^/usr/X11R6.working-usr/X11R6-' /usr/X11R6.working
\> /tmp/beta.tar
This time, the -s
argument
specifies a substitution expression that will replace the beginning of
the path /usr/X11R6.working with
usr/X11R6 in the archive.
POSIX does not specify the format of multivolume archive headers, meaning that every archiver
may use a different intervolume header format. If you have a lot of
multivolume tar
archives and plan
to switch to a different tar
implementation, you should test whether you can still recover your old
multivolume archives.
This practice may have been more common when Minix/QNX4 users
archived their 20 MB hard disks to a stack of floppy disks. Minix/QNX4
users had the vol
utility to handle multiple volumes; instead of adding
the multivolume functionality to the archiver itself, it was handled
by a separate utility. You should be able to switch archiver
implementations transparently because vol
did the splitting, not the
archiver.
The vol
utility performs the
following operations:
At the end-of-media, prompts for the next volume
Verifies the ordering of the volumes
Concatenates the multiple volumes
Unfortunately, the vol
utility isn’t part of the NetBSD package collection. If you create a
lot of multivolume archives, you may want to look into porting one of
the following utilities:
vol
Creates volume headers for
tar
; developed by Brian Yost and available at http://groups.google.com/groups?selm=80%40mirror.UUCP&output=gplainmultivol
Provides multiple volume support; created by Marc Schaefer and available at http://www.ibiblio.org/pub/Linux/system/backup/multivol-2.1.tar.bz2
man pax
NetBSD’s PGP package (ftp://ftp.NetBSD.org/pub/NetBSD/packages/pkgsrc/security/pgp2/README.html)
The GNU
tar
manpage on GNUtar
and POSIXtar
(http://www.gnu.org/software/tar/manual/html_node/tar_117.html)The
pax
-A
bug report and fix (http://www.NetBSD.org/cgi-bin/query-pr-single.pl?number=23776)
When cp alone doesn’t quite meet your copy needs.
The cp
command is easy to use, but it does have its limitations.
For example, have you ever needed to copy a batch of files with the same
name? If you’re not careful, they’ll happily overwrite each
other.
I recently had the urge to find all of the scripts on my
system that created a menu. I knew that several ports used scripts
named configure
and that some of
those scripts used dialog
to
provide a menu selection.
It was easy enough to find those scripts using find
:
% find /usr/ports -name configure -exec grep -l "dialog" /dev/null { } \;
/usr/ports/audio/mbrolavox/scripts/configure
/usr/ports/devel/kdesdk3/work/kdesdk-3.2.0/configure
/usr/ports/emulators/vmware2/scripts/configure
(snip)
This command asks find
to
start in /usr/ports, looking for
files -name
d configure
. For each found file, it should
search for the word dialog
using
-exec grep
. The -l
flag tells grep
to list only the names of the matching
files, without including the lines that match the expression. You may
recognize the /dev/null { } \;
from
[Hack
#13] .
Normally, I could tell cp
to
use those found files as the source and to copy them to the specified
destination. This is done by enclosing the find
command within a set of backticks
(`
), located at the far top left of
your keyboard. Note what happens, though:
%mkdir ~/scripts
%cd ~/scripts
%cp `find /usr/ports -name configure -exec grep -l "dialog"
\
/dev/null { } \;` .
%ls ~/scripts
configure
Although each file that I copied had a different pathname, the
filename itself was configure
.
Since each copied file overwrote the previous one, I ended up with one
remaining file.
What’s needed is to rename those source files as they are
copied to the destination. One approach is to replace the slash
(/
) in the original file’s pathname
with a different character, resulting in a unique filename that still
reflects the source of that file.
As we saw in [Hack #15] , sed
is designed to do such replacements.
Here’s an approach:
%pwd
/usr/home/dru/scripts %find /usr/ports -name configure -exec grep -l "dialog" /dev/null { } \;
\
-exec sh -c 'cp { } `echo { } | sed s:/:=:g`' \;
%ls
=usr=ports=audio=mbrolavox=scripts=configure =usr=ports=devel=kdesdk3=work=kdesdk-3.2.0=configure =usr=ports=emulators=vmware2=scripts=configure (snip)
This invocation of find
starts off the same as my original search. It then adds a second
-exec
, which passes an argument
-c
as input to the sh
shell. The shell will cp
the source files (specified by { }
), but only after sed
has replaced each slash in the pathname
with an equals sign (=
). Note that
I changed the sed
delimiter from
the default slash to the colon (:) so I didn’t have to escape my
/
string. You don’t have to use
=
as the new character; choose
whatever suits your purposes.
awk
can also perform this
renaming feat. The following command is more or less equivalent to the
previous command:
%find /usr/ports -name configure -exec grep -l "dialog" /dev/null { } \;
\
| awk '{dst=$0;gsub("/","=",dst); print "cp",$0,dst}' | sh
Depending upon how many files you plan on copying over and how picky you are about their destination names, you may prefer to do an interactive copy.
Despite its name, cp
’s
interactive switch (-i
) will fail
miserably in my scenario:
%cp -i `find /usr/ports -name configure -exec grep -l "dialog" \
/dev/null { } \;` .
overwrite ./configure? (y/n [n])n
not overwritten overwrite ./configure? (y/n [n]) (snip)
Since each file is still named configure
, my only choices are either to
overwrite the previous file or to not copy over the new file. However,
both cpio
and pax
are capable of interactive copies. Let’s
start with cpio
:
%find /usr/ports -name configure -exec grep -l "dialog" /dev/null { } \;
\
| cpio -o > ~/scripts/test.cpio && cpio -ir < ~/scripts/test.cpio
Here I’ve piped my find
command to cpio
. Normally, I would
invoke cpio
once in copy-pass mode.
Unfortunately, that mode doesn’t support -r
, the interactive rename switch. So, I
directed cpio
to send its output
(-o >
) to an archive named
~/scripts/test.cpio. Instead of
piping that archive, I used &&
to delay the next cpio
operation until the previous one
finishes. I then used -ir
to
perform an interactive copy in that archive so I could type in the
name of each destination file.
Here are the results:
cpio: /usr/ports/audio/mbrolavox/scripts/configure: truncating inode number cpio: /usr/ports/devel/kdesdk3/work/kdesdk-3.2.0/configure: truncating inode number cpio: /usr/ports/emulators/vmware2/scripts/configure: truncating inode number (snip other archive messages) 5136 blocks rename /usr/ports/audio/mbrolavox/scripts/configure ->mbrolavox.configure
rename /usr/ports/devel/kdesdk3/work/kdesdk-3.2.0/configure ->kdesdk3.configure
rename /usr/ports/emulators/vmware2/scripts/configure ->vmware2.configure
(snip remaining rename operations) 5136 blocks
After creating the archive, cpio
showed me the source name so I could
rename the destination file. While requiring interaction on my part,
it does let me fine-tune exactly what I’d like to call each script. I
must admit that my names are much nicer than those containing all of
the equals signs.
pax
is even more efficient.
In the preceding command, the first cpio
has to wait until find
completes, and the second cpio
has to wait until the first cpio
finishes. Compare that to this
command:
%find /usr/ports -name configure -exec grep -l "dialog" /dev/null { } \;
\
| pax -rwi .
Here, I can pipe the results of find
directly to pax
, and pax
has very user-friendly switches. In this
command, I asked to r
ead and
w
rite i
nteractively to the current directory.
There’s no temporary archive required, and everything happens at once.
Even better, pax
starts working on
the interaction before find
finishes. Here’s what it looks like:
ATTENTION: pax interactive file rename operation.
-rwxr-xr-x Nov 11 07:53 /usr/ports/audio/mbrolavox/scripts/configure
Input new name, or a "." to keep the old name, or a "return" to skip
this file.
Input > mbrovalox.configure
Processing continues, name changed to: mbrovalox.configure
This repeats for each and every file that matched the find
results.
When it comes to backups, Unix systems are extremely flexible. For starters, they come with built-in utilities that are just waiting for an administrator’s imagination to combine their talents into a customized backup solution. Add that to one of Unix’s greatest strengths: its ability to see everything as a file. This means you don’t even need backup hardware. You have the ability to send your backup to a file, to a media, to another server, or to whatever is available.
As with any customized solution, your success depends upon a little forethought. In this scenario, I don’t have any backup hardware, but I do have a network with a 100 Mbps switch and a system with a large hard drive capable of holding backups.
On the system with that large hard drive, I have sshd
running. (An alternative to consider is
the scponly
shell; see [Hack
#63] ). I’ve also created a user and a group called
rembackup
:
#pw groupadd rembackup
#pw useradd rembackup -g rembackup -m -s /bin/csh
#passwd rembackup
Changing local password for rembackup New Password: Retype New Password: #
If you’re new to the pw
command, the -g
switch puts the user in the specified group (which must already
exist), the -m
switch creates the
user’s home directory, and the -s
switch sets the default shell. (There’s really no good mnemonic;
perhaps no one remembers what, if anything, pw
stands for.)
Next, from the system I plan on backing up, I’ll ensure that I
can ssh
in as the user rembackup
. In this scenario, the system with
the large hard drive has an IP address of 10.0.0.1:
%ssh -l rembackup 10.0.0.1
The authenticity of host '10.0.0.1 (10.0.0.1)' can't be established. DSA key fingerprint is e2:75:a7:85:46:04:71:51:db:a8:9e:83:b1:5c:7a:2c. Are you sure you want to continue connecting (yes/no)?yes
Warning: Permanently added '192.168.2.93' (DSA) to the list of known hosts. Password: % %exit
logout Connection to 10.0.0.1 closed.
Excellent. Since I can log in as rembackup
, it looks like both systems are
ready for a test backup.
I’ll start by testing my command at a command line. Once I’m happy with the results, I’ll create a backup script to automate the process.
# tar czvf - /usr/home | ssh rembackup@10.0.0.1 "cat > genisis_usr_home.tgz"
usr/home/
usr/home/dru/
usr/home/dru/.cshrc
usr/home/dru/mail/
usr/home/mail/sent-mail
Password:
This tar
command creates
(c
) a compressed (z
) backup to a file (f
) while showing the results verbosely
(v
). The minus character (-
) represents the specified file, which in
this case is stdout. This allows me to pipe stdout to the ssh
command. I’ve provided /usr/home, which contains all of my users’
home directories, as the hierarchy to back up.
The results of that backup are then piped (|
) to ssh
, which will send that output (via
cat
) to a compressed file called
genisis_usr_home.tgz in the
rembackup
user’s home directory.
Since that directory holds the backups for my network, I chose a
filename that indicates the name of the host, genisis
, and the contents of the backup
itself.
Now that I can securely back up my users’ home directories, I can create a script. It can start out as simple as this:
# more /root/bin/backup
#!/bin/sh
# script to backup /usr/home to backup server
tar czvf - /usr/home | ssh rembackup@10.0.0.1 "cat > genisis_usr_home.tgz"
However, whenever I run that script, I’ll overwrite the previous backup. If that’s not my intention, I can include the date as part of the backup name:
tar czvf - /usr/home | ssh rembackup@10.0.0.1 "cat > \ genisis_usr_home.`date +%d.%m.%y`.tgz"
Notice I inserted the date
command into the filename using backticks. Now the backup file will
include the day, month, and year separated by dots, resulting in a
filename like genisis_usr_home.21.12.03.tgz.
Once you’re happy with your results, your script is an
excellent candidate for a
cron
job.
Make remote backups automatic and effortless.
One day, the IDE controller on my web server died, leaving the files on my hard disk hopelessly corrupted. I faced what I had known in the back of my mind all along: I had not been making regular remote backups of my server, and the local backups were of no use to me now that the drive was corrupted.
The reason for this, of course, is that doing remote backups
wasn’t automatic and effortless. Admittedly, this was no one’s fault but
my own, but my frustration was sufficient enough that I decided to write
a tool that would make automated remote snapshots so easy that I
wouldn’t ever have to worry about it again. Enter rsnapshot
.
Installation on FreeBSD is a simple matter of:
#cd /usr/ports/sysutils/rsnapshot
#make install
I didn’t include the clean
target here, as I’d like to keep the work
subdirectory, which includes some
useful scripts.
Tip
If you’re not using FreeBSD, see the original HOWTO at the project web site for detailed instructions on installing from source.
The install process neither creates nor installs the config file. This means that there is absolutely no possibility of accidentally overwriting a previously existing config file during an upgrade. Instead, copy the example configuration file and make changes to the copy:
# cp /usr/local/etc/rsnapshot.conf.default /usr/local/etc/rsnapshot.conf
The rsnapshot.conf config
file is well commented, and much of it should be fairly
self-explanatory. For a full reference of all the various options,
please consult man
rsnapshot
.
rsnapshot
uses the /.snapshots/ directory to hold the
filesystem snapshots. This is referred to as the snapshot
root. This must point to a filesystem where you have lots
of free disk space.
Tip
Note that fields are separated by tabs, not spaces. This makes it easier to specify file paths with spaces in them.
rsnapshot
has no idea how often you want to take snapshots. In order
to specify how much data to save, you need to tell rsnapshot
which intervals to keep, and how
many of each.
By default, a snapshot will occur every four hours, or six times a day (these are the hourly intervals). It will also keep a second set of snapshots, taken once a day and stored for a week (or seven days):
interval hourly 6 interval daily 7
Note that the hourly interval is specified first. This is very important, as the first interval line is assumed to be the smallest unit of time, with each additional line getting successively bigger. Thus, if you add a yearly interval, it should go at the bottom, and if you add a minutes interval, it should go before the hourly interval. It’s also worth noting that the snapshots are pulled up from the smallest interval to the largest. In this example, the daily snapshots are pulled from the oldest hourly snapshot, not directly from the main filesystem.
The backup
section tells
rsnapshot
which files you
actually want to back up:
backup /etc/ localhost/etc/
In this example, backup
is
the backup point, /etc/ is the
full path to the directory we want to take snapshots of, and
localhost/etc/ is a
subdirectory inside the snapshot
root where the snapshots are stored. If you are taking
snapshots of several machines on one dedicated backup server, it’s a
good idea to use hostnames as directories to keep track of which
files came from which server.
In addition to full paths on the local filesystem, you can
also back up remote systems using rsync
over ssh
. If you
have ssh
enabled (via the cmd_ssh
parameter), specify a path similar
to this:
backup backup@example.com:/etc/ example.com/etc/
This behaves fundamentally the same way as specifying local pathnames, but you must take a few extra things into account:
The
ssh
daemon must be running on example.com.You must have access to the specified account on the remote machine (in this case, the
backup
user on example.com). See [Hack #38] for instructions on setting this up.You must have key-based logins enabled for the specified user at example.com, without passphrases.
This backup occurs over the network, so it may be slower. Since this uses
rsync
, this is most noticeable during the first backup. Depending on how much your data changes, subsequent backups should go much faster.
Tip
One thing you can do to mitigate the potential damage from a backup server breach is to create alternate users on the client machines with their UIDs and GIDs set to 0, but with a more restrictive shell, such as scponly [Hack #63] .
With the backup_script
parameter, the second column is the full path to an executable
backup script, and the third column is the local path in which you
want to store it. For example:
backup_script /usr/local/bin/backup_pgsql.sh localhost/postgres/
Tip
You can find the backup_pgsql.sh example script in the
utils/ directory of the
source distribution. Alternatively, if you didn’t include the
clean
target when you installed
the FreeBSD port, the file will be located in /usr/ports/sysutils/rsnapshot/work/rsnapshot-1.0.9/utils.
Your backup script only needs to dump its output into its
current working directory. It can create as many files and
directories as necessary, but it should not put its files in any
predetermined path. This is because rsnapshot
creates a temp directory,
changes to that directory, runs the backup script, and then syncs
the contents of the temp directory to the local path you specified
in the third column. A typical backup script might look like
this:
#!/bin/sh /usr/bin/mysqldump -uroot mydatabase > mydatabase.sql /bin/chown 644 mydatabase.sql
There are a couple of example scripts in the utils/ directory of the rsnapshot
source distribution to give you
more ideas.
After making your changes, verify that the config file is syntactically valid and that all the supporting programs are where you think they are:
# rsnapshot configtest
If all is well, the output should say Syntax OK
. If there’s a problem, it should
tell you exactly what it is.
The final step to test your configuration is to run rsnapshot
with the -t
flag, for test mode. This will print
out a verbose list of the things it will do, without actually doing
them. For example, to simulate an hourly backup:
# rsnapshot -t hourly
All backups are stored within a configurable snapshot root
directory. In the beginning it will be empty. rsnapshot
creates subdirectories for the various defined
intervals. After a week, the directory should look something like
this:
# ls -l /.snapshots/
drwxr-xr-x 7 root root 4096 Dec 28 00:00 daily.0
drwxr-xr-x 7 root root 4096 Dec 27 00:00 daily.1
drwxr-xr-x 7 root root 4096 Dec 26 00:00 daily.2
drwxr-xr-x 7 root root 4096 Dec 25 00:00 daily.3
drwxr-xr-x 7 root root 4096 Dec 24 00:00 daily.4
drwxr-xr-x 7 root root 4096 Dec 23 00:00 daily.5
drwxr-xr-x 7 root root 4096 Dec 22 00:00 daily.6
drwxr-xr-x 7 root root 4096 Dec 29 00:00 hourly.0
drwxr-xr-x 7 root root 4096 Dec 28 20:00 hourly.1
drwxr-xr-x 7 root root 4096 Dec 28 16:00 hourly.2
drwxr-xr-x 7 root root 4096 Dec 28 12:00 hourly.3
drwxr-xr-x 7 root root 4096 Dec 28 08:00 hourly.4
drwxr-xr-x 7 root root 4096 Dec 28 04:00 hourly.5
Each of these directories contains a full backup of that point
in time. The destination directory paths you specified as the backup
and backup_script
parameters are placed directly
under these directories. In the example:
backup /etc/ localhost/etc/
the /etc/ directory will initially back up into /.snapshots/hourly.0/localhost/etc/.
Each subsequent time rsnapshot
is run with the hourly
command, it will rotate the hourly.X directories, “copying” the
contents of the hourly.0
directory (using hard links) into hourly.1.
When rsnapshot
daily
runs, it will rotate all the daily.X directories, then copy the contents
of hourly.5 into daily.0.
hourly.0 will always
contain the most recent snapshot, and daily.6 will always contain a snapshot from
a week ago. Unless the files change between snapshots, the full
backups are really just multiple hard links to the same files. This is
how rsnapshot
uses space so
efficiently. If the file changes at any point, the next backup will
unlink the hard link in hourly.0,
replacing it with a brand new file. This will now use twice the disk
space it did before, but it is still considerably less space than 13
full, unique copies would occupy.
Remember, if you are using different intervals than the ones in this example, the first interval listed is the one that gets updates directly from the main filesystem. All subsequently listed intervals pull from the previous snapshots.
When rsnapshot
first runs, it will create the configured snapshot_root directory. It assigns this
directory the permissions 0700
since the snapshots will probably contain files owned by all sorts of
users on your system.
The simplest but least flexible solution is to disallow access to the snapshot root altogether. The root user will still have access, of course, and will be the only one who can pull backups. This may or may not be desirable, depending on your situation. For a small setup, this may be sufficient.
If users need to be able to pull their own backups, you will
need to do a little extra work up front. The best option seems to be
creating a container directory for the snapshot root with 0700
permissions, giving the snapshot root
directory 0755
permissions, and
mounting the snapshot root for the users as read-only using NFS or
Samba.
Let’s explore how to do this using NFS on a single machine.
First, set the snapshot_root
variable in rsnapshot.conf:
snapshot_root /usr/.private/.snapshots/
Then, create the container directory, the real snapshot root, and a read-only mount point:
#mkdir /usr/.private/
#mkdir /usr/.private/.snapshots/
#mkdir /.snapshots/
Set the proper permissions on these new directories:
#chmod 0700 /usr/.private/
#chmod 0755 /usr/.private/.snapshots/
#chmod 0755 /.snapshots/
In /etc/exports, add /usr/.private/.snapshots/ as a read-only NFS export:
/usr/.private/.snapshots/ 127.0.0.1(ro)
Tip
If your version of NFS supports it, include the no_root_squash
option. (Place it within
the brackets after ro
with a
comma—not a space—as the separator.) This option allows the root
user to see all the files within the read-only export.
In /etc/fstab, mount /usr/.private/.snapshots/ read-only under /.snapshots/:
localhost:/usr/.private/.snapshots/ /.snapshots/ nfs ro 0 0
Restart your NFS daemon and mount the read-only snapshot root:
#/etc/rc.d/nfsd restart
#mount /.snapshots/
To test this, try adding a file as the superuser:
# touch /.snapshots/testfile
This should fail with insufficient permissions. This is what you want. It means that your users won’t be able to mess with the snapshots either.
Users who wish to recover old files can go into the /.snapshots directory, select the interval they want, and browse through the filesystem until they find the files they are looking for. NFS will prevent them from making modifications, but they can copy anything that they had permission to read in the first place.
man rsnapshot
The original
rsnapshot
HOWTO (http://www.rsnapshot.org/rsnapshot-HOWTO.html)
Building your own backup utility doesn’t have to be scary.
PostgreSQL is a robust, open source database server. Like most
database servers, it provides utilities for creating backups.
PostgreSQL’s primary tools for creating backup files are pg_dump
and pg_dumpall
. However, if you want to automate
your database backup processes, these tools have a few
limitations:
pg_dump
dumps only one database at a time.pg_dumpall
dumps all of the databases into a single file.pg_dump
andpg_dumpall
know nothing about multiple backups.
These aren’t criticisms of the backup tools—just an observation that customization will require a little scripting. Our resulting script will backup multiple systems, each to their own backup file.
This script uses Python and its ability to execute other programs to implement the following backup algorithm:
Change the working directory to a specified database backup directory.
Rename all backup files ending in .gz so that they end in .gz.old. Existing files ending in .gz.old will be overwritten.
Clean up and analyze all PostgreSQL databases using its
vacuumdb
command.Get a current list of databases from the PostgreSQL server.
Dump each database, piping the results through
gzip
, into its own compressed file.
Why Python? My choice is one of personal preference; this task is achievable in just about any scripting language. However, Python is cross-platform and easy to learn, and its scripts are easy to read.
#!/usr/local/bin/python # /usr/local/bin/pg2gz.py # This script lists all PostgreSQL # databases and pipes them separately # through gzip into .gz files. # INSTRUCTIONS # 1. Review and edit line 1 to reflect the location # of your python command file. # 2. Redefine the save_dir variable (on line 22) to # your backup directory. # 3. To automate the backup process fully, consider # scheduling the regular execution of this script # using cron. import os, string # Redefine this variable to your backup directory. # Be sure to include the slash at the end. save_dir = '/mnt/backup/databases/' # Rename all *.gz backup files to *.gz.old. curr_files = os.listdir(save_dir) for n in curr_files: if n[len(n)-2:] = = 'gz': os.popen('mv ' + save_dir + n + " " + save_dir + n + '.old') else: pass # Vacuum all databases os.popen('vacuumdb -a -f -z') # 'psql -l' produces a list of PostgreSQL databases. get_list = os.popen('psql -l').readlines( ) # Exclude header and footer lines. db_list = get_list[3:-2] # Extract database names from first element of each row. for n in db_list: n_row = string.split(n) n_db = n_row[0] # Pipe database dump through gzip # into .gz files for all databases # except template*. if n_db = = 'template0': pass elif n_db = = 'template1': pass else: os.popen('pg_dump ' + n_db + ' | gzip -c > ' + save_dir + n_db + '.gz')
The script assumes that you have a working installation of PostgreSQL. You’ll also need to install Python, which is available through the ports collection or as a binary package. The Python modules used are installed by default.
Double-check the location of your Python executable using:
% which python
/usr/local/bin/python
and ensure the first line of the script reflects your location.
Don’t forget to make the script executable using chmod +x
.
On line 22 of the script, redefine the sav_dir
variable to reflect the location of
your backup directory. As is, the script assumes a backup directory of
/mnt/backup/databases/.
You’ll probably want to add the script to the pgsql
user’s crontab for periodic execution.
To schedule the script for execution, log in as pgsql
or, as the superuser, su
to pgsql
. Once you’re acting as pgsql
, execute:
% crontab -e
to open the crontab file in the default editor.
Given the following crontab file, /usr/local/bin/pg2gz.py will execute at 4 AM every Sunday.
# more /var/cron/tabs/pgsql
SHELL=/bin/sh
PATH=/var/cron/tabs:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin
#minute hour mday month wday command
0 4 * * 0 /usr/local/bin/pg2gz.py
The PostgreSQL web site (http://www.postgresql.org/)
The Python web site (http://www.python.org/)
Don’t let the campy name fool you. Bacula is a powerful, flexible, open source backup program. .
Having problems finding a backup solution that fits all your needs? One that can back up both Unix and Windows systems? That is flexible enough to back up systems with irregular backup needs, such as laptops? That allows you to run scripts before or after the backup job? That provides browsing capabilities so you can decide upon a restore point? Bacula may be what you’re looking for.
Bacula is a client-server solution composed of several distinct parts:
- Director
The Director is the most complex part of the system. It keeps track of all clients and files to be backed up. This daemon talks to the clients and to the storage devices.
- Client/File Daemon
The Client (or File) Daemon runs on each computer which will be backed up by the Director. Some other backup solutions refer to this as the Agent.
- Storage Daemon
The Storage Daemon communicates with the backup device, which may be tape or disk.
- Console
The Console is the primary interface between you and the Director. I use the command-line Console, but there is also a GNOME GUI Console.
Each File Daemon will have an entry in the Director configuration file. Other important entries include FileSets and Jobs. A FileSet identifies a set of files to back up. A Job specifies a single FileSet, the type of backup (incremental, full, etc.), when to do the backup, and what Storage Device to use. Backup and restore jobs can be run automatically or manually.
Bacula stores details of each backup in a database. You can use either SQLite or MySQL, and starting with Bacula Version 1.33, PostgreSQL. Before you install Bacula, decide which database you want to use.
Warning
FreeBSD 4.x (prior to 4.10-RELEASE) and FreeBSD 5.x (Version 5.2.1 and earlier) have a pthreads bug that could cause you to lose data. Refer to platform/freebsd/pthreads-fix.txt in your Bacula source directory for full details.
The existing Bacula documentation provides detailed installation instructions if you’re installing from source. To install instead the SQLite version of the FreeBSD port:
#cd /usr/ports/sysutils/bacula
#make install
Or, if you prefer to install the MySQL version:
#cd /usr/ports/sysutils/bacula
#make -DWITH_MYSQL install
Bacula installs several configuration files that should work for your environment with few modifications.
The first configuration file, /usr/local/etc/bacula-fd.conf, is for the File Daemon. This file needs to reside on each machine you want to back up. For security reasons, only the Directors specified in this file will be able to communicate with this File Daemon. The name and password specified in the Director resource must be supplied by any connecting Director.
You can specify more than one Director
{
}
resource. Make sure the password matches the one in the
Client resource in the Director’s configuration file.
The FileDaemon { }
resource
identifies this system and specifies the port on which it will
listen for Directors. You may have to create a directory manually to
match the one specified by the Working
Directory
.
The next configuration file, /usr/local/etc/bacula-sd.conf, is for the Storage Daemon. The default values should work unless you need to specify additional storage devices.
As with the File Daemon, the Director
{ }
resource specifies the Director(s) that may contact
this Storage Daemon. The password must match that found in the
Storage resource in the Director’s configuration file.
The Director’s configuration is by necessity the largest of the daemons. Each Client, Job, FileSet, and Storage Device is defined in this file.
In the following example configuration, I’ve defined the Job
Client1
to back up the files
defined by the FileSet Full Set
on a laptop. The backup will be performed to the File
storage device, which is really a
disk located at laptop.example.org.
Tip
This isn’t an optimal solution for a real backup, as I’m just backing up files from the laptop to somewhere else on the laptop. It is sufficient for demonstration and testing, though.
# more /usr/local/etc/bacula-dir.conf
Director {
Name = laptop-dir
DIRport = 9101
QueryFile = "/usr/local/etc/query.sql"
WorkingDirectory = "/var/db/bacula"
PidDirectory = "/var/run"
Maximum Concurrent Jobs = 1
Password = "lLftflC4QtgZnWEB6vAGcOuSL3T6n+P7jeH+HtQOCWwV"
Messages = Standard
}
Job {
Name = "Client1"
Type = Backup
Client = laptop-fd
FileSet = "Full Set"
Schedule = "WeeklyCycle"
Storage = File
Messages = Standard
Pool = Default
Write Bootstrap = "/var/db/bacula/Client1.bsr"
Priority = 10
}
FileSet {
Name = "Full Set"
Include = signature=MD5 {
/usr/ports/sysutils/bacula/work/bacula-1.32c
}
# If you backup the root directory, the following two excluded
# files can be useful
#
Exclude = { /proc /tmp /.journal /.fsck }
}
Client {
Name = laptop-fd
Address = laptop.example.org
FDPort = 9102
Catalog = MyCatalog
Password = "laptop-client-password"
File Retention = 30 days
Job Retention = 6 months
AutoPrune = yes
}
# Definition of file storage device
Storage {
Name = File
Address = laptop.example.org
SDPort = 9103
Password = "TlDGBjTWkjTS/0HNMPF8ROacI3KlgIUZllY6NS7+gyUp"
Device = FileStorage
Media Type = File
}
Note that the password given by any connecting Console must match the one here.
Now that you’ve modified the configuration files to suit your needs, use Bacula’s scripts to create and define the database tables that it will use.
To set up for MySQL:
#cd /usr/ports/sysutils/bacula/work/bacula-1.32c/src/cats
#./grant_mysql_privileges
#./create_mysql_database
#./make_mysql_tables
If you have a password set for the MySQL root account, add
-p
to these commands and you will
be prompted for the password. You now have a working database suitable
for use by Bacula.
Some tape drives are not standard. They require their own
proprietary software and can be temperamental when used with other
software. Regardless of what software it uses, each drive model can
have its own little quirks that need to be catered to. Fortunately,
Bacula comes with btape
, a handy
little utility for testing your drive.
My tape drive is at /dev/sa1. Bacula prefers to use the non-rewind variant of the device, but it can handle the raw variant as well. If you use the rewinding device, then only one backup job per tape is possible. This command will test the non-rewind device /dev/nrsa1:
# /usr/local/sbin/btape -c /usr/local/etc/bacula-sd.conf /dev/nrsa1
It is a good idea to run daemons with the lowest possible privileges. The Storage Daemon and the Director Daemon do not need root permissions. However, the File Daemon does, because it needs to access all files on your system.
In order to run daemons with nonroot accounts, you need to
create a user and a group. Here, I used vipw
to create the user. I selected a user
ID and group ID of 1002, as they were unused on my system.
bacula:*:1002:1002::0:0:Bacula Daemon:/var/db/bacula:/sbin/nologin
I also added this line to /etc/group:
bacula:*:1002:
The bacula
user (as opposed
to the Bacula daemon) will have a home directory of /var/db/bacula, which is the default
location for the Bacula database.
Now that you have both a bacula
user and a bacula
group, you can secure the bacula home directory by issuing this
command:
# chown -R bacula:bacula /var/db/bacula/
To start the Bacula daemons on a FreeBSD system, issue the following command:
# /usr/local/etc/rc.d/bacula.sh start
To confirm they are all running:
# ps auwx | grep bacula
root 63416 0.0 0.3 2040 1172 ?? Ss 4:09PM 0:00.01
/usr/local/sbin/bacula-sd -v -c /usr/local/etc/bacula-sd.conf
root 63418 0.0 0.3 1856 1036 ?? Ss 4:09PM 0:00.00
/usr/local/sbin/bacula-fd -v -c /usr/local/etc/bacula-fd.conf
root 63422 0.0 0.4 2360 1440 ?? Ss 4:09PM 0:00.00
/usr/local/sbin/bacula-dir -v -c /usr/local/etc/bacula-dir.conf
The console is the main interface through which you run jobs, query system status, and examine the Catalog contents, as well as label, mount, and unmount tapes. There are two consoles available: one runs from the command line, and the other is a GNOME GUI. I will concentrate on the command-line console.
To start the console, I use this command:
# /usr/local/sbin/console -c /usr/local/etc/console.conf
Connecting to Director laptop:9101
1000 OK: laptop-dir Version: 1.32c (30 Oct 2003)
*
You can obtain a list of the available commands with the
help
command. The status all
command is a quick and easy way
to verify that all components are up and running. To label a Volume,
use the label
command.
Bacula comes with a preset backup job to get you started. It will back up the directory from which Bacula was installed. Once you get going and have created your own jobs, you can safely remove this job from the Director configuration file.
Not surprisingly, you use the run
command to run a job. Once the job runs,
the results will be sent to you via email, according to the Messages
resource settings within your Director configuration file.
To restore a job, use the restore
command. You should choose the
restore location carefully and ensure there is sufficient disk space
available.
It is easy to verify that the restored files match the original:
#diff -ruN \
/tmp/bacula-restores/usr/ports/sysutils/bacula/work/bacula-1.32c \
/usr/ports/sysutils/bacula/work/bacula-1.32c
#
For my testing, I wanted to back up files on my Windows XP machine every hour. I created this schedule:
Schedule { Name = "HourlyCycle" Run = Full 1st sun at 1:05 Run = Differential 2nd-5th sun at 1:05 Run = Incremental Hourly }
Any Job that uses this schedule will be run at the following times:
A full backup will be done on the first Sunday of every month at 1:05 AM.
A differential backup will be run on the 2nd, 3rd, 4th, and 5th Sundays of every month at 1:05 AM.
Every hour, on the hour, an incremental backup will be done.
So far we have been testing Bacula on the server. With the FreeBSD port, installing a client-only version of Bacula is easy:
#cd /usr/ports/sysutils/bacula
#make -DWITH_CLIENT_ONLY install
You will also need to tell the Director about this client by adding a new Client resource to the Director configuration file. You will also want to create a Job and FileSet resource.
When you change the Bacula configuration files, remember to restart the daemons:
# /usr/local/etc/rc.d/bacula.sh restart
Stopping the Storage daemon
Stopping the File daemon
Stopping the Director daemon
Starting the Storage daemon
Starting the File daemon
Starting the Director daemon
#
The Bacula web site (http://www.bacula.org/)
http://www.onlamp.com/pub/a/onlamp/2004/01/09/bacula.html (the original Bacula article from ONLamp)
Get BSD Hacks now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.