Chapter 1. Server Basics

Hacks #1-22

A running Linux system is a complex interaction of hardware and software where invisible daemons do the user’s bidding, carrying out arcane tasks to the beat of the drum of the uncompromising task master called the Linux kernel.

A Linux system can be configured to perform many different kinds of tasks. When running as a desktop machine, the visible portion of Linux spends much of its time controlling a graphical display, painting windows on the screen, and responding to the user’s every gesture and command. It must generally be a very flexible (and entertaining) system, where good responsiveness and interactivity are the critical goals.

On the other hand, a Linux server generally is designed to perform a couple of tasks, nearly always involving the squeezing of information down a network connection as quickly as possible. While pretty screen savers and GUI features may be critical to a successful desktop system, the successful Linux server is a high performance appliance that provides access to information as quickly and efficiently as possible. It pulls that information from some sort of storage (like the filesystem, a database, or somewhere else on the network) and delivers that information over the network to whomever requested it, be it a human being connected to a web server, a user sitting in a shell, or over a port to another server entirely.

It is under these circumstances that a system administrator finds their responsibilities lying somewhere between deity and janitor. Ultimately, the sysadmin’s job is to provide access to system resources as quickly (and equitably) as possible. This job involves both the ability to design new systems (that may or may not be rooted in solutions that already exist) and the talent (and the stomach) for cleaning up after people who use that system without any concept of what “resource management” really means.

The most successful sysadmins remove themselves from the path of access to system resources and let the machines do all of the work. As a user, you know that your sysadmin is effective when you have the tools that you need to get the job done and you never need to ask your sysadmin for anything. To pull off (that is, to hack) this impossible sounding task requires that the sysadmin anticipate what the users’ needs will be and make efficient use of the resources that are available.

To begin with, I’ll present ways to optimize Linux to perform only the work that is required to get the job done and not waste cycles doing work that you’re not interested in doing. You’ll see some examples of how to get the system to do more of the work of maintaining itself and how to make use of some of the more obscure features of the system to make your job easier. Parts of this section (particularly Command Line and Resource Management) include techniques that you may find yourself using every day to help build a picture of how people are using your system and ways that you might improve it.

These hacks assume that you are already familiar with Linux. In particular, you should already have root on a running Linux system available with which to experiment and should be comfortable with working on the system from the command line. You should also have a good working knowledge of networks and standard network services. While I hope that you will find these hacks informative, they are certainly not a good introduction to Linux system administration. For in-depth discussion on good administrative techniques, I highly recommend the Linux Network Administrator’s Guide and Essential System Administration, both by O’Reilly and Associates.

The hacks in this chapter are grouped together into the following five categories: Boot Time, Command Line, Automation, Resource Management, and Kernel Tuning.

Removing Unnecessary Services

Fine tune your server to provide only the services you really want to serve

When you build a server, you are creating a system that should perform its intended function as quickly and efficiently as possible. Just as a paint mixer has no real business being included as an espresso machine attachment, extraneous services can take up resources and, in some cases, cause a real mess that is completely unrelated to what you wanted the server to do in the first place. This is not to say that Linux is incapable of serving as both a top-notch paint mixer and making a good cup of coffee simultaneously — just be sure that this is exactly what you intend before turning your server loose on the world (or rather, turning the world loose on your server).

When building a server, you should continually ask yourself: what do I really need this machine to do? Do I really need FTP services on my web server? Should NFS be running on my DNS server, even if no shares are exported? Do I need the automounter to run if I mount all of my volumes statically?

To get an idea of what your server is up to, simply run a ps ax . If nobody is logged in, this will generally tell you what your server is currently running. You should also see what programs for which your inetd is accepting connections, with either a grep -v ^# /etc/inetd.conf or (more to the point) netstat -lp . The first command will show all uncommented lines in your inetd.conf, while the second (when run as root) will show all of the sockets that are in the LISTEN state, and the programs that are listening on each port. Ideally, you should be able to reduce the output of a ps ax to a page of information or less (barring preforking servers like httpd, of course).

Here are some notorious (and typically unnecessary) services that are enabled by default in many distributions:

portmap, rpc.mountd, rpc.nfsd

These are all part of the NFS subsystem. Are you running an NFS server? Do you need to mount remote NFS shares? Unless you answered yes to either of these questions, you don’t need these daemons running. Reclaim the resources that they’re taking up and eliminate the potential security risk.

smbd and nmbd

These are the Samba daemons. Do you need to export SMB shares to Windows boxes (or other machines)? If not, then these processes can be safely killed.

automount

The automounter can be handy to bring up network (or local) filesystems on demand, eliminating the need for root privileges when accessing them. This is especially handy on client desktop machines, where a user needs to use removable media (such as CDs or floppies) or to access network resources. But on a dedicated server, the automounter is probably unnecessary. Unless your machine is providing console access or remote network shares, you can kill the automounter (and set up all of your mounts statically, in /etc/fstab).

named

Are you running a name server? You don’t need named running if you only need to resolve network names; that’s what /etc/resolv.conf and the bind libraries are for. Unless you’re running name services for other machines, or are running a caching DNS server (see [Hack #78]), then named isn’t needed.

lpd

Do you ever print to this machine? Chances are, if it’s serving Internet resources, it shouldn’t be accepting print requests anyway. Remove the printer daemon if you aren’t planning on using it.

inetd

Do you really need to run any services from inetd? If you have ssh running in standalone mode, and are only running standalone daemons (such as Apache, BIND, MySQL, or ProFTPD) then inetd may be superfluous. In the very least, review which services are being accepted with the grep command grep -v ^# /etc/inetd.conf. If you find that every service can be safely commented out, then why run the daemon? Remove it from the boot process (either by removing it from the system rc’s or with a simple chmod -x /usr/sbin/inetd).

telnet, rlogin, rexec, ftp

The remote login, execution, and file transfer functionality of these venerable daemons has largely been supplanted by ssh and scp, their cryptographically secure and tremendously flexible counterparts. Unless you have a really good reason to keep them around, it’s a good idea to eliminate support for these on your system. If you really need to support ftp connections, you might try the mod_sql plugin for proftpd (see [Hack #85]).

finger, comsat, chargen, echo, identd

The finger and comsat services made sense in the days of an open Internet, where users were curious but generally well-intentioned. In these days of stealth portscans and remote buffer overflow exploits, running extraneous services that give away information about your server is generally considered a bad idea. The chargen and echo ports were once good for testing network connectivity, but are now too inviting for a random miscreant to fiddle with (and perhaps connect to each other to drive up server load quickly and inexpensively).

Finally, the identd service was once a meaningful and important source of information, providing remote servers with an idea of which users were connecting to their machines. Unfortunately, in these days of local root exploits and desktop Linux machines, installing an identd that (perish the thought!) actually lies about who is connected has become so common that most sites ignore the author information anyway. Since identd is a notoriously shaky source of information, why leave it enabled at all?

To eliminate unnecessary services, first shut them down (either by running service stop in /etc/rc.d/init.d/, removing them from /etc/inetd.conf, or by killing them manually). Then to be sure that they don’t start again the next time the machine reboots, remove their entry from /etc/rc.d/*. Once you have your system trimmed down to only the services you intend to serve, reboot the machine and check the process table again.

If you absolutely need to run insecure services on your machine, then you should use tcp wrappers or local firewalling to limit access to only the machines that absolutely need it.

Forgoing the Console Login

All of the access, none of the passwords

It will happen to you one day. You’ll need to work on a machine for a friend or client who has “misplaced” the root password on which you don’t have an account.

If you have console access and don’t mind rebooting, traditional wisdom beckons you to boot up in single user mode. Naturally, after hitting Control-Alt-Delete, you simply wait for it to POST and then pass the parameter single to the booting kernel. For example, from the LILO prompt:

LILO: linux single

On many systems, this will happily present you with a root shell. But on some systems (notably RedHat), you’ll run into the dreaded emergency prompt:

Give root password for maintenance
(or type Control-D for normal startup)

If you knew the root password, you wouldn’t be here! If you’re lucky, the init script will actually let you hit ^C at this stage and will drop you to a root prompt. But most init processes are “smarter” than that, and trap ^C. What to do? Of course, you could always boot from a rescue disk and reset the password, but suppose you don’t have one handy (or that the machine doesn’t have a CD-ROM drive).

All is not lost! Rather than risk running into the above mess, let’s modify the system with extreme prejudice, right from the start. Again, from the LILO prompt:

LILO: linux init=/bin/bash

What does this do? Rather than start /sbin/init and proceed with the usual /etc/rc.d/* procedure, we’re telling the kernel to simply give us a shell. No passwords, no filesystem checks (and for that matter, not much of a starting environment!) but a very quick, shiny new root prompt.

Unfortunately, that’s not quite enough to be able to repair your system. The root filesystem will be mounted read-only (since it never got a chance to be checked and remounted read/write). Also, networking will be down, and none of the usual system daemons will be running. You don’t want to do anything more complicated than resetting a password (or tweaking a file or two) at a prompt like this. Above all: don’t hit ^D or type Exit! Your little shell (plus the kernel) constitutes the entire running Linux system at the moment. So, how can you manipulate the filesystem in this situation, if it is mounted read-only? Try this:

# mount -o remount,rw /

That will force the root filesystem to be remounted read-write. You can now type passwd to change the root password (and if the original admin lost the password, consider the ramifications of giving them access to the new one. If you were the original admin, consider writing it in invisible ink on a post-it note and sticking it to your screen, or stitching it into your underwear, or maybe even taking up another hobby).

Once the password is reset, DO NOT REBOOT. Since there is no init running, there is no process in place for safely taking the system down. The quickest way to shutdown safely is to remount root again:

# mount -o remount,ro /

With the root partition readonly, you can confidently hit the Reset button, bring it up in single-user mode, and begin your actual work.

Common Boot Parameters

Manipulate kernel parameters at boot time

As we saw in [Hack #2], it is possible to pass parameters to the kernel at the LILO prompt allowing you to change the program that is first called when the system boots. Changing init (with the init=/bin/bash line) is just one of many useful options that can be set at boot time. Here are more common boot parameters:

single

Boots up in single user mode.

root=

Changes the device that is mounted as /. For example:

root=/dev/sdc4

will boot from the fourth partition on the third scsi disk (instead of whatever your boot loader has defined as the default).

hdX=

Adjusts IDE drive geometry. This is useful if your BIOS reports incorrect information:

hda=3649,255,63 hdd=cdrom

This defines the master/primary IDE drive as a 30GB hard drive in LBA mode, and the slave/secondary IDE drive as a CD-ROM.

console=

Defines a serial port console on kernels with serial console support. For example:

console=ttyS0,19200n81

Here we’re directing the kernel to log boot messages to ttyS0 (the first serial port), at 19200 baud, no parity, 8 data bits, 1 stop bit. Note that to get an actual serial console (that you can log in on), you’ll need to add a line to /etc/inittab that looks something like this:

s1:12345:respawn:/sbin/agetty 19200 ttyS0 vt100
nosmp

Disables SMP on a kernel so enabled. This can help if you suspect kernel trouble on a multiprocessor system.

mem=

Defines the total amount of available system memory. See [Hack #21].

ro

Mounts the / partition read-only (this is typically the default, and is remounted read-write after fsck runs).

rw

Mounts the / partition read-write. This is generally a bad idea, unless you’re also running the init hack. Pass your init line along with rw, like this:

init=/bin/bash rw

to eliminate the need for all of that silly mount -o remount,rw / stuff in [Hack #2]. Congratulations, now you’ve hacked a hack.

You can also pass parameters for SCSI controllers, IDE devices, sound cards, and just about any other device driver. Every driver is different, and typically allows for setting IRQs, base addresses, parity, speeds, options for auto-probing, and more. Consult your online documentation for the excruciating details.

See also:

  • man bootparam

  • /usr/src/linux/Documentation/*

Creating a Persistent Daemon with init

Make sure that your process stays up, no matter what

There are a number of scripts that will automatically restart a process if it exits unexpectedly. Perhaps the simplest is something like:

$ while : ; do echo "Run some code here..."; sleep 1; done

If you run a foreground process in place of that echo line, then the process is always guaranteed to be running (or, at least, it will try to run). The : simply makes the while always execute (and is more efficient than running /bin/true, as it doesn’t have to spawn an external command on each iteration). Definitely do not run a background process in place of the echo, unless you enjoy filling up your process table (as the while will then spawn your command as many times as it can, one every second). But as far as cool hacks go, the while approach is fairly lacking in functionality.

What happens if your command runs into an abnormal condition? If it exits immediately, then it will retry every second, without giving any indication that there is a problem (unless the process has its own logging system or uses syslog). It might make sense to have something watch the process, and stop trying to respawn it if it returns too quickly after a few tries.

There is a utility already present on every Linux system that will do this automatically for you: init . The same program that brings up the system and sets up your terminals is perfectly suited for making sure that programs are always running. In fact, that is its primary job.

You can add arbitrary lines to /etc/inittab specifying programs you’d like init to watch for you:

zz:12345:respawn:/usr/local/sbin/my_daemon

The inittab line consists of an arbitrary (but unique) two character identification string (in this case, zz), followed by the runlevels that this program should be run in, then the respawn keyword, and finally the full path to the command. In the above example, as long as my_daemon is configured to run in the foreground, init will respawn another copy whenever it exits. After making changes to inittab, be sure to send a HUP to init so it will reload its configuration. One quick way to do this is:

# kill -HUP 1

If the command respawns too quickly, then init will postpone execution for a while, to keep it from tying up too many resources. For example:

zz:12345:respawn:/bin/touch /tmp/timestamp

This will cause the file /tmp/timestamp to be touched several times a second, until init decides that enough is enough. You should see this message in /var/log/messages almost immediately:

Sep 8 11:28:23 catlin init: Id "zz" respawning too fast: disabled for 5 minutes

In five minutes, init will try to run the command again, and if it is still respawning too quickly, it will disable it again.

Obviously, this method is fine for commands that need to run as root, but what if you want your auto-respawning process to run as some other user? That’s no problem: use sudo:

zz:12345:respawn:/usr/bin/sudo -u rob /bin/touch /tmp/timestamp

Now that touch will run as rob, not as root. If you’re trying these commands as you read this, be sure to remove the existing /tmp/timestamp before trying this sudo line. After sending a HUP to init, take a look at the timestamp file:

rob@catlin:~# ls -al /tmp/timestamp 
-rw-r--r-- 1 rob users 0 Sep 8 11:28 /tmp/timestamp

The two drawbacks to using init to run arbitrary daemons are that you need to comment out the line in inittab if you need to bring the daemon down (since it will just respawn if you kill it) and that only root can add entries to inittab. But for keeping a process running that simply must stay up no matter what, init does a great job.

See also:

n>&m: Swap Standard Output and Standard Error

Direct standard out and standard error to wherever you need them to go

By default, a command’s standard error goes to your terminal. The standard output goes to the terminal or is redirected somewhere (to a file, down a pipe, into backquotes).

Sometimes you want the opposite. For instance, you may need to send a command’s standard output to the screen and grab the error messages (standard error) with backquotes. Or, you might want to send a command’s standard output to a file and the standard error down a pipe to an error-processing command. Here’s how to do that in the Bourne shell. (The C shell can’t do this.)

File descriptors 0, 1, and 2 are the standard input, standard output, and standard error, respectively. Without redirection, they’re all associated with the terminal file /dev/tty. It’s easy to redirect any descriptor to any file — if you know the filename. For instance, to redirect file descriptor to errfile, type:

$ command
             2> errfile

You know that a pipe and backquotes also redirect the standard output:

$ command
             | ...\
$ var=`
            command`

But there’s no filename associated with the pipe or backquotes, so you can’t use the 2> redirection. You need to rearrange the file descriptors without knowing the file (or whatever) that they’re associated with. Here’s how.

Let’s start slowly. By sending both standard output and standard error to the pipe or backquotes, the Bourne shell operator n>&m rearranges the files and file descriptors. It says “make file descriptor n point to the same file as file descriptor m.” Let’s use that operator on the previous example. We’ll send standard error to the same place standard output is going:

$ command
             2>&1 | ...
$ var=`
            command
             2>&1`

In both those examples, 2>&1 means “send standard error (file descriptor 2) to the same place standard output (file descriptor 1) is going.” Simple, eh?

You can use more than one of those n>&m operators. The shell reads them left-to-right before it executes the command.

“Oh!” you might say, “To swap standard output and standard error — make stderr go down a pipe and stdout go to the screen — I could do this!”

$ command
             2>&1 1>&2 | ... 
            (wrong...)

Sorry, Charlie. When the shell sees 2>&1 1>&2, the shell first does 2>&1. You’ve seen that before — it makes file descriptor 2 (stderr) go the same place as file descriptor 1 (stdout). Then, the shell does 1>&2. It makes stdout (1) go the same place as stderr (2), but stderr is already going the same place as stdout, down the pipe.

This is one place that the other file descriptors, 3 through 9, come in handy. They normally aren’t used. You can use one of them as a “holding place” to remember where another file descriptor “pointed.” For example, one way to read the operator 3>&2 is “make 3 point the same place as 2”. After you use 3>&2 to grab the location of 2, you can make 2 point somewhere else. Then, make 1 point to where 2 used to (where 3 points now).

The command line you want is one of these:

$ command
             3>&2 2>&1 1>&3 | ...
$ var=`
            command
             3>&2 2>&1 1>&3`

Open files are automatically closed when a process exits. But it’s safer to close the files yourself as soon as you’re done with them. That way, if you forget and use the same descriptor later for something else (for instance, use F.D. 3 to redirect some other command, or a subprocess uses F.D. 3), you won’t run into conflicts. Use m<&- to close input file descriptor m and m>&- to close output file descriptor m. If you need to close standard input, use <&- ; >&- will close standard output.

Building Complex Command Lines

Build simple commands into full-fledged paragraphs for complex (but meaningful) reports

Studying Linux (or indeed any Unix) is much like studying a foreign language. At some magical point in the course of one’s studies, halting monosyllabic mutterings begin to meld together into coherent, often used phrases. Eventually, one finds himself pouring out entire sentences and paragraphs of the Unix Mother Tongue, with one’s mind entirely on the problem at hand (and not on the syntax of any particular command). But just as high school foreign language students spend much of their time asking for directions to the toilet and figuring out just what the dative case really is, the path to Linux command-line fluency must begin with the first timidly spoken magic words.

Your shell is very forgiving, and will patiently (and repeatedly) listen to your every utterance, until you get it just right. Any command can serve as the input for any other, making for some very interesting Unix “sentences.” When armed with the handy (and probably over-used) up arrow, it is possible to chain together commands with slight tweaks over many tries to achieve some very complex behavior.

For example, suppose that you’re given the task of finding out why a web server is throwing a bunch of errors over time. If you type less error_log, you see that there are many “soft errors” relating to missing (or badly linked) graphics:

[Tue Aug 27 00:22:38 2002] [error] [client 17.136.12.171] File does not 
exist: /htdocs/images/spacer.gif
[Tue Aug 27 00:31:14 2002] [error] [client 95.168.19.34] File does not 
exist: /htdocs/image/trans.gif
[Tue Aug 27 00:36:57 2002] [error] [client 2.188.2.75] File does not 
exist: /htdocs/images/linux/arrows-linux-back.gif
[Tue Aug 27 00:40:37 2002] [error] [client 2.188.2.75] File does not 
exist: /htdocs/images/linux/arrows-linux-back.gif
[Tue Aug 27 00:41:43 2002] [error] [client 6.93.4.85] File does not 
exist: /htdocs/images/linux/hub-linux.jpg
[Tue Aug 27 00:41:44 2002] [error] [client 6.93.4.85] File does not 
exist: /htdocs/images/xml/hub-xml.jpg
[Tue Aug 27 00:42:13 2002] [error] [client 6.93.4.85] File does not 
exist: /htdocs/images/linux/hub-linux.jpg
[Tue Aug 27 00:42:13 2002] [error] [client 6.93.4.85] File does not 
exist: /htdocs/images/xml/hub-xml.jpg

and so on. Running a logging package (like analog) reports exactly how many errors you have seen in a day but few other details (which is how you were probably alerted to the problem in the first place). Looking at the logfile directly gives you every excruciating detail but is entirely too much information to process effectively.

Let’s start simple. Are there any errors other than missing files? First we’ll need to know how many errors we’ve had today:

$ wc -l error_log
1265 error_log

And how many were due to File does not exist errors?

$ grep "File does not exist:" error_log | wc -l
1265 error_log

That’s a good start. At least we know that we’re not seeing permission problems or errors in anything that generates dynamic content (like cgi scripts.) If every error is due to missing files (or typos in our html that point to the wrong file) then it’s probably not a big problem. Let’s generate a list of the filenames of all bad requests. Hit the up arrow and delete that wc -l:

$ grep "File does not exist:" error_log | awk '{print $13}' | less

That’s the sort of thing that we want (the 13th field, just the filename), but hang on a second. The same couple of files are repeated many, many times. Sure, we could email this to the web team (all whopping 1265 lines of it), but I’m sure they wouldn’t appreciate the extraneous spam. Printing each file exactly once is easy:

$ grep "File does not exist:" error_log | awk '{print $13}' | sort | uniq |
less

This is much more reasonable (substitute a wc -l for that less to see just how many unique files have been listed as missing). But that still doesn’t really solve the problem. Maybe one of those files was requested once, but another was requested several hundred times. Naturally, if there is a link somewhere with a typo in it, we would see many requests for the same “missing” file. But the previous line doesn’t give any indication of which files are requested most. This isn’t a problem for bash; let’s try out a command line for loop.

$ for x in `grep "File does not exist" error_log | awk '{print $13}' | sort 
| uniq`; do \
echo -n "$x : "; grep $x error_log | wc -l; done

We need those backticks ( `) to actually execute our entire command from the previous example and feed the output of it to a for loop. On each iteration through the loop, the $x variable is set to the next line of output of our original command (that is, the next unique filename reported as missing). We then grep for that filename in the error_log, and count how many times we see it. The echo at the end just prints it in a somewhat nice report format.

I call it a somewhat nice report because not only is it full of single hit errors (which we probably don’t care about), the output is very jagged, and it isn’t even sorted! Let’s sort it numerically, with the biggest hits at the top, numbers on the left, and only show the top 20 most requested “missing” files:

$ for x in `grep "File does not exist" error_log | awk '{print $13}' | sort 
| uniq`; do \
grep $x error_log | wc -l | tr -d '\n'; echo " : $x"; done | sort +2 -rn | 
head -20

That’s much better and not even much more typing than the last try. We need the tr to eliminate the trailing newline at the end of wc’s output (why it doesn’t have a switch to do this, I’ll never know). Your output should look something like this:

595 : /htdocs/images/pixel-onlamp.gif.gif
156 : /htdocs/image/trans.gif
139 : /htdocs/images/linux/arrows-linux-back.gif
68 : /htdocs/pub/a/onjava/javacook/images/spacer.gif
50 : /htdocs/javascript/2001/03/23/examples/target.gif

From this report, it’s very simple to see that almost half of our errors are due to a typo on a popular web page somewhere (note the repeated .gif.gif in the first line). The second is probably also a typo (should be images/, not image/). The rest are for the web team to figure out:

$ ( echo "Here's a report of the top 20 'missing' files in the error_log."; 
echo; \
for x in `grep "File does not exist" error_log | awk '{print $13}' | sort | 
uniq`; do \
grep $x error_log | wc -l | tr -d '\n'; echo " : $x"; done | sort +2 -rn | 
head -20 )\
| mail -s "Missing file report" webmaster@oreillynet.com

and maybe one hardcopy for the weekly development meeting:

$ for x in `grep "File does not exist" error_log | awk '{print $13}' | sort 
| uniq`; do \
grep $x error_log | wc -l | tr -d '\n'; echo " : $x"; done | sort +2 -rn | 
head -20 \
| enscript

Hacking the Hack

Once you get used to chunking groups of commands together, you can chain their outputs together indefinitely, creating any sort of report you like out of a live data stream. Naturally, if you find yourself doing a particular task regularly, you might want to consider turning it into a shell script of its own (or even reimplementing it in Perl or Python for efficiency’s sake, as every | in a command means that you’ve spawned yet another program). On a modern (and unloaded) machine, you’ll hardly notice the difference, but it’s considered good form to clean up the solution once you’ve hacked it out. And on the command line, there’s plenty of room to hack.

Working with Tricky Files in xargs

Deal with many files containing spaces or other strange characters

When you have a number of files containing spaces, parentheses, and other “forbidden” characters, dealing with them can be daunting. This is a problem that seems to come up frequently, with the recent explosive popularity of digital music. Luckily, tab completion in bash makes it simple to handle one file at a time. For example:

rob@catlin:~/Music$ ls
Hallucinogen - The Lone Deranger
Misc - Pure Disco
rob@catlin:~/Music$ rm -rf Misc[TAB]
rob@catlin:~/Music$ rm -rf Misc\ -\ Pure\ Disco/

Hitting the Tab key for [TAB] above replaces the command line with the line below it, properly escaping any special characters contained in the file. That’s fine for one file at a time, but what if we want to do a massive transformation (say, renaming a bunch of mp3s to include an album name)? Take a look at this:

rob@catlin:~/Music$ cd Hall[TAB]
rob@catlin:~/Music$ cd Hallucinogen\ -\ The\ Lone\ Deranger/
rob@catlin:~/Music/Hallucinogen - The Lone Deranger$ ls
Hallucinogen - 01 - Demention.mp3
Hallucinogen - 02 - Snakey Shaker.mp3
Hallucinogen - 03 - Trancespotter.mp3
Hallucinogen - 04 - Horrorgram.mp3
Hallucinogen - 05 - Snarling (Remix).mp3
Hallucinogen - 06 - Gamma Goblins Pt. 2.mp3
Hallucinogen - 07 - Deranger.mp3
Hallucinogen - 08 - Jiggle of the Sphinx.mp3
rob@catlin:~/Music/Hallucinogen - The Lone Deranger$

When attempting to manipulate many files at once, things get tricky. Many system utilities break on whitespace (yielding many more chunks than you intended) and will completely fall apart if you throw a ) or a { at them. What we need is a delimiter that is guaranteed never to show up in a filename, and break on that instead.

Fortunately, the xargs utility will break on NULL characters, if you ask it to nicely. Take a look at this script:

Listing: albumize

#!/bin/sh

if [ -z "$ALBUM" ]; then
echo 'You must set the ALBUM name first (eg. export ALBUM="Greatest Hits")'
exit 1
fi

for x in *; do
echo -n $x; echo -ne '\000'
echo -n `echo $x|cut -f 1 -d '-'`
echo -n " - $ALBUM - "
echo -n `echo $x|cut -f 2- -d '-'`; echo -ne '\000'
done | xargs -0 -n2 mv

We’re actually doing two tricky things here. First, we’re building a list consisting of the original filename followed by the name to which we’d like to mv it, separated by NULL characters, for all files in the current directory. We then feed that entire list to an xargs with two switches: -0 tells it to break on NULLs (instead of newlines or whitespace), and -n2 tells it to take two arguments at a time on each pass, and feed them to our command (mv).

Save the script as ~/bin/albumize. Before you run it, set the $ALBUM environment variable to the name that you’d like injected into the filename just after the first -. Here’s a trial run:

rob@catlin:~/Music/Hallucinogen - The Lone Deranger$ export ALBUM="The Lone Deranger"
rob@catlin:~/Music/Hallucinogen - The Lone Deranger$ albumize
rob@catlin:~/Music/Hallucinogen - The Lone Deranger$ ls
Hallucinogen - The Lone Deranger - 01 - Demention.mp3
Hallucinogen - The Lone Deranger - 02 - Snakey Shaker.mp3
Hallucinogen - The Lone Deranger - 03 - Trancespotter.mp3
Hallucinogen - The Lone Deranger - 04 - Horrorgram.mp3
Hallucinogen - The Lone Deranger - 05 - Snarling (Remix).mp3
Hallucinogen - The Lone Deranger - 06 - Gamma Goblins Pt. 2.mp3
Hallucinogen - The Lone Deranger - 07 - Deranger.mp3
Hallucinogen - The Lone Deranger - 08 - Jiggle of the Sphinx.mp3
rob@catlin:~/Music/Hallucinogen - The Lone Deranger$

What if you would like to remove the album name again? Try this one, and call it ~/bin/dealbumize:

#!/bin/sh

for x in *; do
echo -n $x; echo -ne '\000'
echo -n `echo $x|cut -f 1 -d '-'`; echo -n ' - '
echo -n `echo $x|cut -f 3- -d '-'`; echo -ne '\000'
done | xargs -0 -n2 mv

and simply run it (no $ALBUM required):

rob@catlin:~/Music/Hallucinogen - The Lone Deranger$ dealbumize
rob@catlin:~/Music/Hallucinogen - The Lone Deranger$ ls
Hallucinogen - 01 - Demention.mp3
Hallucinogen - 02 - Snakey Shaker.mp3
Hallucinogen - 03 - Trancespotter.mp3
Hallucinogen - 04 - Horrorgram.mp3
Hallucinogen - 05 - Snarling (Remix).mp3
Hallucinogen - 06 - Gamma Goblins Pt. 2.mp3
Hallucinogen - 07 - Deranger.mp3
Hallucinogen - 08 - Jiggle of the Sphinx.mp3
rob@catlin:~/Music/Hallucinogen - The Lone Deranger$

The -0 switch is also popular to team up with the -print0 option of find (which, naturally, prints matching filenames separated by NULLs instead of newlines). With find and xargs on a pipeline, you can do anything you like to any number of files, without ever running into the dreaded Argument list too long error:

rob@catlin:~/Pit of too many files$ ls *
bash: /bin/ls: Argument list too long

A find/xargs combo makes quick work of these files, no matter what they’re called:

rob@catlin:/Pit of too many files$ find -type f -print0 | xargs -0 ls

To delete them, just replace that trailing ls with an rm, and away you go.

Immutable Files in ext2/ext3

Create files that even root can’t manipulate

Here’s a puzzle for you. Suppose we’re cleaning up /tmp, and run into some trouble:

root@catlin:/tmp# rm -rf junk/
rm: cannot unlink `junk/stubborn.txt': Operation not permitted
rm: cannot remove directory `junk': Directory not empty
root@catlin:/tmp# cd junk/
root@catlin:/tmp/junk# ls -al
total 40
drwxr-xr-x 2 root root 4096 Sep 4 14:45 ./
drwxrwxrwt 13 root root 4096 Sep 4 14:45 ../
-rw-r--r-- 1 root root 29798 Sep 4 14:43 stubborn.txt
root@catlin:/tmp/junk# rm ./stubborn.txt 
rm: remove write-protected file `./stubborn.txt'? y
rm: cannot unlink `./stubborn.txt': Operation not permitted

What’s going on? Are we root or aren’t we? Let’s try emptying the file instead of deleting it:

root@catlin:/tmp/junk# cp /dev/null stubborn.txt 
cp: cannot create regular file `stubborn.txt': Permission denied
root@catlin:/tmp/junk# > stubborn.txt 
bash: stubborn.txt: Permission denied

Well, /tmp certainly isn’t mounted read-only. What is going on?

In the ext2 and ext3 filesystems, there are a number of additional file attributes that are available beyond the standard bits accessible through chmod. If you haven’t seen it already, take a look at the manpages for chattr and its companion, lsattr .

One of the very useful new attributes is -i, the immutable flag. With this bit set, attempts to unlink, rename, overwrite, or append to the file are forbidden. Even making a hard link is denied (so you can’t make a hard link, then edit the link). And having root privileges makes no difference when immutable is in effect:

root@catlin:/tmp/junk# ln stubborn.txt another.txt
ln: creating hard link `another.txt' to `stubborn.txt': Operation not permitted

To view the supplementary ext flags that are in force on a file, use lsattr:

root@catlin:/tmp/junk# lsattr
---i--------- ./stubborn.txt

and to set flags a la chmod, use chattr:

root@catlin:/tmp/junk# chattr -i stubborn.txt 
root@catlin:/tmp/junk# rm stubborn.txt 
root@catlin:/tmp/junk#

This could be terribly useful for adding an extra security step on files you know you’ll never want to change (say, /etc/rc.d/* or various configuration files.) While little will help you on a box that has been r00ted, immutable files probably aren’t vulnerable to simple overwrite attacks from other processes, even if they are owned by root.

There are hooks for adding compression, security deletes, undeletability, synchronous writes, and a couple of other useful attributes. As of this writing, many of the additional attributes aren’t implemented yet, but keep watching for new developments on the ext filesystem.

Speeding Up Compiles

Make sure you’re keeping all processors busy with parallel builds

If you’re running a multiprocessor system (SMP) with a moderate amount of RAM, you can usually see significant benefits by performing a parallel make when building code. Compared to doing serial builds when running make (as is the default), a parallel build is a vast improvement.

To tell make to allow more than one child at a time while building, use the -j switch:

rob@mouse:~/linux$ make -j4; make -j4 modules

Some projects aren’t designed to handle parallel builds and can get confused if parts of the project are built before their parent dependencies have completed. If you run into build errors, it is safest to just start from scratch this time without the -j switch.

By way of comparison, here are some sample timings. They were performed on an otherwise unloaded dual PIII/600 with 1GB RAM. Each time I built a bzImage for Linux 2.4.19 (redirecting STDOUT to /dev/null), and removed the source tree before starting the next test.

            time make bzImage:

real 7m1.640s
user 6m44.710s
sys 0m25.260s

time make -j2 bzImage:

real 3m43.126s
user 6m48.080s
sys 0m26.420s

time make -j4 bzImage:

real 3m37.687s
user 6m44.980s
sys 0m26.350s

time make -j10 bzImage:

real 3m46.060s
user 6m53.970s
sys 0m27.240s

As you can see, there is a significant improvement just by adding the -j2 switch. We dropped from 7 minutes to 3 minutes and 43 seconds of actual time. Increasing to -j4 saved us about five more seconds, but jumping all the way to -j10 actually hurt performance by a few seconds. Notice how user and system seconds are virtually the same across all four runs. In the end, you need to shovel the same sized pile of bits, but -j on a multi-processor machine simply lets you spread it around to more people with shovels.

Of course, bits all eventually end up in the bit bucket anyway. But hey, if nothing else, performance timings are a great way to keep your cage warm.

At Home in Your Shell Environment

Make bash more comfortable through environment variables

Consulting a manpage for bash can be a daunting read, especially if you’re not precisely sure what you’re looking for. But when you have the time to devote to it, the manpage for bash is well worth the read. This is a shell just oozing with all sorts of arcane (but wonderfully useful) features, most of which are simply disabled by default.

Let’s start by looking at some useful environment variables, and some useful values to which to set them:

export PS1=`echo -ne "\033[0;34m\u@\h:\033[0;36m\w\033[0;34m\$\033[0;37m "`

As you probably know, the PS1 variable sets the default system prompt, and automatically interprets escape sequences such as \u (for username) and \w (for the current working directory.) As you may not know, it is possible to encode ANSI escape sequences in your shell prompt, to give your prompt a colorized appearance. We wrap the whole string in backticks (` ) in order to get echo to generate the magic ASCII escape character. This is executed once, and the result is stored in PS1. Let’s look at that line again, with boldface around everything that isn’t an ANSI code:

export PS1=`echo -ne "\033[0;34m\u@\h:\033[0;36m\w\033[0;34m\$\033[0;37m "`

You should recognize the familiar \u@\h:\w\$prompt that we’ve all grown to know and love. By changing the numbers just after each semicolon, you can set the colors of each part of the prompt to your heart’s content.

Along the same lines, here’s a handy command that is run just before bash gives you a prompt:

export PROMPT_COMMAND='echo -ne "\033]0;${USER}@${HOSTNAME}: ${PWD}\007"'

(We don’t need backticks for this one, as bash is expecting it to contain an actual command, not a string.) This time, the escape sequence is the magic string that manipulates the titlebar on most terminal windows (such as xterm, rxvt, eterm, gnometerm, etc.). Anything after the semicolon and before the \007 gets printed to your titlebar every time you get a new prompt. In this case, we’re displaying your username, the host you’re logged into, and the current working directory. This is quite handy for being able to tell at a glance (or even while within vim) to which machine you’re logged in, and to what directory you’re about to save your file. See [Hack #59] if you’d like to update your titlebar in real time instead of at every new bash prompt.

Have you ever accidentally hit ^D too many times in a row, only to find yourself logged out? You can tell bash to ignore as many consecutive ^D hits as you like:

export IGNOREEOF=2

This makes bash follow the Snark rule (“What I tell you three times is true”) and only log you out if you hit ^D three times in a row. If that’s too few for you, feel free to set it to 101 and bash will obligingly keep count for you.

Having a directory just off of your home that lies in your path can be extremely useful (for keeping scripts, symlinks, and other random pieces of code.) A traditional place to keep this directory is in bin underneath your home directory. If you use the ~ expansion facility in bash, like this:

export PATH=$PATH:~/bin

then the path will always be set properly, even if your home directory ever gets moved (or if you decide you want to use this same line on multiple machines with potentially different home directories — as in movein.sh). See [Hack #72].

Did you know that just as commands are searched for in the PATH variable (and manpages are searched for in the MANPATH variable), directories are likewise searched for in the CDPATH variable every time you issue a cd? By default, it is only set to “.”, but can be set to anything you like:

export CDPATH=.:~

This will make cd search not only the current directory, but also your home directory for the directory you try to change to. For example:

rob@caligula:~$ ls
bin/ devel/ incoming/ mail/ test/ stuff.txt
rob@caligula:~$ cd /usr/local/bin
rob@caligula:/usr/local/bin$ cd mail
bash: cd: mail: No such file or directory
rob@caligula:/usr/local/bin$ export CDPATH=.:~
rob@caligula:/usr/local/bin$ cd mail
/home/rob/mail
rob@caligula:~/mail$

You can put as many paths as you like to search for in CDPATH, separating each with a : (just as with the PATH and MANPATH variables.)

We all know about the up arrow and the history command. But what happens if you accidentally type something sensitive on the command line? Suppose you slip while typing and accidentally type a password where you meant to type a command. This accident will faithfully get recorded to your ~/.bash_history file when you logout, where another unscrupulous user might happen to find it. Editing your .bash_history manually won’t fix the problem, as the file gets rewritten each time you log out.

To clear out your history quickly, try this from the command line:

export HISTSIZE=0

This completely clears out the current bash history and will write an empty .bash_history on logout. When you log back in, your history will start over from scratch but will otherwise work just as before. From now on, try to be more careful!

Do you have a problem with people logging into a machine, then disconnecting their laptop and going home without logging back out again? If you’ve ever run a w and seen a bunch of idle users who have been logged in for several days, try setting this in their environment:

export TMOUT=600

The TMOUT variable specifies the number of seconds that bash will wait for input on a command line before logging the user out automatically. This won’t help if your users are sitting in a vi window but will alleviate the problem of users just sitting at an idle shell. Ten minutes might be a little short for some users, but kindly remind them that if they don’t like the system default, they are free to reset the variable themselves.

This brings up an interesting point: exactly where do you go to make any of these environment changes permanent? There are several files that bash consults when starting up, depending on whether the shell was called at login or from within another shell.

From bash(1), on login shells:

...it first reads and executes commands from the file /etc/profile, if that file exists. After reading that file, it looks for ~/.bash_profile, ~/.bash_login, and ~/.profile, in that order, and reads and executes commands from the first one that exists and is readable . . . . When a login shell exits, bash reads and executes commands from the file ~/.bash_logout , if it exists.

For all other shells:

bash reads and executes commands from ~/.bashrc , if that file exists.

For the full definition of what constitutes a login shell (and for a whole bunch of information about the environment you work in every day), consult bash(1).

Finding and Eliminating setuid/setgid Binaries

Eliminate potential root exploits before they have a chance to happen

While running Linux as a server, one guiding principle that has served me well is to continually ask, “what am I trying to achieve?” Does it make sense for a web server to have the printing subsystem installed? Should a system with no console have gpm installed? Usually, extra software packages just take up unnecessary disk space, but in the case of setuid or setgid binaries, the situation could be far worse.

While distribution maintainers work very hard to ensure that all known exploits for setuid and setgid binaries have been removed, it seems that a few new unexpected exploits come out every month or two. Especially if your server has more shell users than yourself, you should regularly audit the setuid and setgid binaries on your system. Chances are you’ll be surprised at just how many you’ll find.

Here’s one command for finding all of the files with a setuid or setgid bit set:

root@catlin:~# find / -perm +6000 -type f -exec ls -ld {} \; > setuid.txt &

This will create a file called setuid.txt that contains the details of all of the matching files present on your system. It is a very good idea to look through this list, and remove the s bits of any tools that you don’t use.

Let’s look through what we might find on a typical system:

-rws--x--x 1 root bin 35248 May 30 2001 /usr/bin/at
-rws--x--x 1 root bin 10592 May 30 2001 /usr/bin/crontab

Not much surprise here. at and crontab need root privileges in order to change to the user that requested the at job or cron job. If you’re paranoid, and you don’t use these facilities, then you could remove the setuid bits with:

# chmod a-s /usr/bin/{at,crontab}

Generally speaking, it’s a bad idea to disable cron (as so many systems depend on timed job execution). But when was the last time you used at? Do your users even know what it’s for? Personally, I find at a nice shortcut to setting up a full-blown cron job, and wouldn’t like to part with it. But if there is no call for it on your particular system, you should consider defanging it. With the setuid bit removed, the commands will no longer be available to regular users but will still work fine as root.

-rws--x--x 1 root bin 11244 Apr 15 2001 /usr/bin/disable-paste

This is part of the gpm package (a mouse driver for the Linux console). Do you have a mouse attached to the console of this machine? Do you use it in text mode? If not, then why leave a setuid root binary in place that will never even be called?

-r-s--s--x 1 root lp 14632 Jun 18 2001 /usr/bin/lpq
-r-s--s--x 1 root lp 15788 Jun 18 2001 /usr/bin/lpr
-r-s--s--x 1 root lp 15456 Jun 18 2001 /usr/bin/lprm
-r-xr-s--x 1 root lp 23772 Jun 18 2001 /usr/sbin/lpc

These are all part of the printing subsystem. Does this machine actually use lp to print?

-rws--x--x 1 root bin 33760 Jun 18 2000 /usr/bin/chage
-rws--x--x 1 root bin 29572 Jun 18 2000 /usr/bin/chfn
-rws--x--x 1 root bin 27188 Jun 18 2000 /usr/bin/chsh
-rws--x--x 1 root bin 35620 Jun 18 2000 /usr/bin/passwd

These are all necessary for users to be able to set their passwords, shell, and finger information. Does your site use finger information at all? If, like most sites, you have a roster somewhere else (probably on the web) that isn’t kept in sync with user’s GECOS field, then this information is generally useless (except for the user’s “real” name, which is still used in some email clients). Do you really need to allow users to change this information on their own, without admin intervention?

-r-xr-sr-x 1 root tty 9768 Jun 21 2001 /usr/bin/wall
-r-xr-sr-x 1 root tty 8504 Jun 21 2001 /usr/bin/write

Both wall and write need to be setgid tty to write to other user’s terminals. This is generally a safe operation, but can be abused by miscreants who like to write bad data (or lots of bad data) to other user’s terminals. If you don’t need to provide this functionality to other users, why not disable the setgid bit? If you do, root will still be able to send walls and writes (for example, when a message is sent by shutdown when rebooting the system).

-rwsr-xr-x 1 root bin 14204 Jun 3 2001 /usr/bin/rcp
-rwsr-xr-x 1 root bin 10524 Jun 3 2001 /usr/bin/rlogin
-r-sr-xr-x 1 root bin 7956 Jun 3 2001 /usr/bin/rsh

The r* commands are left from an age (perhaps not so long ago), before the days of ssh . Do you need to provide the r commands to your users? Is there anything that ssh and scp can’t do that you absolutely need rsh and rcp for? More than likely, once you’ve worked with ssh for a while, you’ll never miss the r commands, and can safely remove the potential ticking time bomb of a disused setuid rsh installation. If you’re looking for interesting things to do with ssh, check out any of the ssh hacks elsewhere in this book.

-r-sr-xr-x 1 root bin 10200 Jun 3 2001 /usr/bin/traceroute
-r-sr-xr-x 1 root bin 15004 Jun 3 2001 /bin/ping

The traceroute and ping commands need the setuid root bit to be able to create ICMP packets. If you want your users to be able to run these network diagnostic tools, then they’ll need to be setuid root. Otherwise, remove setuid and then only root will be able to run ping and traceroute.

-r-sr-sr-- 1 uucp uucp 83344 Feb 10 2001 /usr/bin/uucp
-r-sr-sr-- 1 uucp uucp 36172 Feb 10 2001 /usr/bin/uuname
-r-sr-sr-- 1 uucp uucp 93532 Feb 10 2001 /usr/bin/uustat
-r-sr-sr-- 1 uucp uucp 85348 Feb 10 2001 /usr/bin/uux
-r-sr-sr-- 1 uucp uucp 65492 Feb 10 2001 /usr/lib/uucp/uuchk
-r-sr-sr-- 1 uucp uucp 213832 Feb 10 2001 /usr/lib/uucp/uucico
-r-sr-sr-- 1 uucp uucp 70748 Feb 10 2001 /usr/lib/uucp/uuconv
-r-sr-sr-- 1 uucp uucp 315 Nov 22 1995 /usr/lib/uucp/uusched
-r-sr-sr-- 1 uucp uucp 95420 Feb 10 2001 /usr/lib/uucp/uuxqt

When was the last time you connected to another machine with UUCP? Have you ever set up the UUCP system? I have been a network admin for ten years, and in that time, I have never come across a live UUCP installation. That’s not to say that UUCP isn’t useful, just that in these days of permanently connected TCP/IP networks, UUCP is becoming extremely uncommon. If you’re not using UUCP, then leaving setuid and setgid binaries online to support it doesn’t make much sense.

Do any of the binaries in the examples above have potential root (or other privilege elevation) exploits? I have no idea. But I do know that by removing unnecessary privileges, I minimize my exposure to the possibility that an exploit might be run on this system if one is discovered.

I have tried to present this hack as a process, not as a solution. This list is certainly by no means definitive. As you build a server, always keep in mind what the intended use is, and build the system accordingly. Whenever possible, remove privileges (or even entire packages) that provide functionality that you simply don’t need. Consult the manpage if you ever wonder about what a particular binary is supposed to be doing, and why it is installed setuid (and when the manpage fails you, remember to use the source).

Make sudo Work Harder

Use sudo to let other users do your evil bidding, without giving away the machine

The sudo utility can help you delegate some system responsibilities to other people, without giving away full root access. It is a setuid root binary that executes commands on an authorized user’s behalf, after they have entered their current password.

As root, run /usr/sbin/visudo to edit the list of users who can call sudo. The default sudo list looks something like this:

root ALL=(ALL) ALL

Unfortunately, many system admins tend to use this entry as a template and grant unrestricted root access to all other admins unilaterally:

root ALL=(ALL) ALL
rob ALL=(ALL) ALL
jim ALL=(ALL) ALL
david ALL=(ALL) ALL

While this may allow you to give out root access without giving away the root password, it is really only a useful method when all of the sudo users can be completely trusted. When properly configured, the sudo utility allows for tremendous flexibility for granting access to any number of commands, run as any arbitrary uid.

The syntax of the sudo line is:

            user machine=(effective user) command

The first column specifies the sudo user. The next column defines the hosts in which this sudo entry is valid. This allows you to easily use a single sudo configuration across multiple machines.

For example, suppose you have a developer who needs root access on a development machine, but not on any other server:

peter beta.oreillynet.com=(ALL) ALL

The next column (in parentheses) specifies the effective user that may run the commands. This is very handy for allowing users to execute code as users other than root:

peter lists.oreillynet.com=(mailman) ALL

Finally, the last column specifies all of the commands that this user may run:

david ns.oreillynet.com=(bind) /usr/sbin/rndc,/usr/sbin/named

If you find yourself specifying large lists of commands (or, for that matter, users or machines), then take advantage of sudo’s Alias syntax. An Alias can be used in place of its respective entry on any line of the sudo configuration:

User_Alias ADMINS=rob,jim,david
User_Alias WEBMASTERS=peter,nancy

Runas_Alias DAEMONS=bind,www,smmsp,ircd

Host_Alias WEBSERVERS=www.oreillynet.com,www.oreilly.com,www.perl.com

Cmnd_Alias PROCS=/bin/kill,/bin/killall,/usr/bin/skill,/usr/bin/top
Cmnd_Alias APACHE=/usr/local/apache/bin/apachectl

WEBMASTERS WEBSERVERS=(www) APACHE
ADMINS ALL=(DAEMONS) ALL

It is also possible to specify system groups in place of the user specification to allow any user who belongs to that group to execute commands. Just preface the group with a %, like this:

%wwwadmin WEBSERVERS=(www) APACHE

Now any user who is part of the wwwadmin group can execute apachectl as the www user on any of the web server machines.

One very useful feature is the NOPASSWD: flag. When present, the user won’t have to enter his password before executing the command:

rob ALL=(ALL) NOPASSWD: PROCS

This will allow the user rob to execute kill, killall, skill, and top on any machine, as any user, without entering a password.

Finally, sudo can be a handy alternative to su for running commands at startup out of the system rc files:

(cd /usr/local/mysql; sudo -u mysql ./bin/safe_mysqld &)

sudo -u www /usr/local/apache/bin/apachectl start

For that to work at boot time, you’ll need the default line root ALL=(ALL) ALL to be present.

Use sudo with the usual caveats that apply to setuid binaries. Particularly if you allow sudo to execute interactive commands (like editors) or any sort of compiler or interpreter, you should assume that it is possible that the sudo user will be able to execute arbitrary commands as the effective user. Still, under most circumstances this isn’t a problem and is certainly preferable to giving away undue access to root privileges.

Using a Makefile to Automate Admin Tasks

Makefiles make everything (not just gcc) faster and easier

You probably know the make command from building projects (probably involving gcc) from source. But not many people also realize that since it keeps track of file modification times, it can be a handy tool for making all sorts of updates whenever arbitrary files are updated.

Here’s a Makefile that is used to maintain sendmail configuration files:

Listing: Makefile.mail

M4= m4
CFDIR= /usr/src/sendmail-8.12.5/cf
CHMOD= chmod
ROMODE= 444
RM= rm -f

.SUFFIXES: .mc .cf

all: virtusers.db aliases.db access.db sendmail.cf

access.db: access.txt
makemap -v hash access < access.txt

aliases.db: aliases
newaliases

virtusers.db: virtusers.txt
makemap -v hash virtusers < virtusers.txt

.mc.cf:
$(RM) $@
$(M4) ${CFDIR}/m4/cf.m4 $*.mc > $@ || ( $(RM) $@ && exit 1 )
$(CHMOD) $(ROMODE) $@

With this installed as /etc/mail/Makefile, you’ll never have to remember to run newaliases when editing your sendmail aliases file, or the syntax of that makemap command when you update virtual domain or access control settings. And best of all, when you update your master mc configuration file (you are using mc and not editing the sendmail.cf by hand, right?) then it will build your new .cf file for you — all by simply typing make. Since make keeps track of files that have been recently updated, it takes care of rebuilding only what needs to be rebuilt.

Here’s another example, used to push Apache configuration files to another server (say, in around-robinApachesetup, which you can learn more about in [Hack #99]. Just put this in your /usr/local/apache/conf directory:

Listing: Makefile.push

#
# Makefile to push *.conf to the slave, as needed.
#
SLAVE= www2.oreillynet.com
APACHE= /usr/local/apache
RM= /bin/rm
TOUCH= /bin/touch
SSH= /usr/local/bin/ssh
SCP= /usr/local/bin/scp

.SUFFIXES: .conf .ts

all: test restart sites.ts globals.ts httpd.ts

configtest: test

test:
@echo -n "Testing Apache configuration: "
@$(APACHE)/bin/apachectl configtest

restart:
$(APACHE)/bin/apachectl restart

.conf.ts:
@$(RM) -f $@
@$(SCP) $*.conf $(SLAVE):$(APACHE)/conf
@$(SSH) $(SLAVE) $(APACHE)/bin/apachectl restart
@$(TOUCH) $@

This example is a little trickier because we’re not actually building any new files, so it’s difficult for make to tell if any of the configuration files have actually changed. We fake out make by creating empty .ts files (short for TimeStamp) that only serve to hold the time and date of the last update. If the real files (httpd.conf, sites.conf, or globals.conf) have changed, then we first run an apachectl configtest to verify that we have a good configuration. If all goes well, it will then restart the local Apache, copy the newly changed files over to the slave server, then restart Apache on the slave server. Finally, we touch the relevant .ts files so we won’t process them again until the .conf files change.

This saves a lot of typing and is significantly quicker (and safer) than doing it all by hand on each update.

Brute Forcing Your New Domain Name

Find exactly the domain you’d like to register, whatever it turns out to be

There are many tools available online that will assist in performing whois queries for you to determine if your favorite domain name is still available, and if not, who has registered it. These tools are usually web based and allow you to submit a few queries at a time (and frequently suggest several inane alternatives if your first choice is taken).

If you’re not so much interested in a particular name as in finding one that matches a pattern, why not let the command line do the work for you? Suppose you wanted to find a list of all words that end in the letters “st”:

cat /usr/share/dict/words | grep 'st$' | sed 's/st$/.st/' | \
while read i; do \
(whois $i | grep -q '^No entries found') && echo $i; sleep 60; \
done | tee list_of_st_domains.txt

This will obligingly supply you with a visual running tab of all available words that haven’t yet been registered to el Republica Democratica de Sào Tomé e Príncipe (the domain registrar for the st TLD). This example searches the system dictionary and tries to find the whois record for each matching word, one at a time, every 60 seconds. It saves any nonexistent records to a file called list_of_st_domains.txt, and shows you its progress as it runs. Replace that st with any two letter TLD (like us or to) to brute force the namespace of any TLD you like.

Some feel that the domain name land grab is almost turning the Internet into a corporate ghetto, but I don’t subscribe to that idea. I actually find the whole situation quite humorous.

Playing Hunt the Disk Hog

Browse your filesystem for heavy usage quickly with a handy alias

It always seems to happen late on a Saturday night. You’re getting paged because a partition on one of the servers (probably the mail server) is dangerously close to full.

Obviously, running a df will show what’s left:

rob@magic:~$ df
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/sda1 7040696 1813680 4863600 27% /
/dev/sda2 17496684 13197760 3410132 79% /home
/dev/sdb1 8388608 8360723 27885 100% /var/spool/mail

But you already knew that the mail spool was full (hence, the page that took you away from an otherwise pleasant, non-mailserver related evening). How can you quickly find out who’s hogging all of the space?

Here’s a one-liner that’s handy to have in your .profile:

alias ducks='du -cks * |sort -rn |head -11'

Once this alias is in place, running ducks in any directory will show you the total in use, followed by the top 10 disk hogs, in descending order. It recurses subdirectories, which is very handy (but can take a long time to run on a heavily loaded server, or in a directory with many subdirectories and files in it). Let’s get to the bottom of this:

rob@magic:~$ cd /var/spool/mail
rob@magic:/var/spool/mail$ ducks
8388608 total
1537216 rob
55120 phil
48800 raw
43175 hagbard
36804 mal
30439 eris
30212 ferris
26042 nick
22464 rachael
22412 valis

Oops! It looks like my mail spool runneth over. Boy, I have orders of magnitude more mail than any other user. I’d better do something about that, such as appropriate new hardware and upgrade the /var/spool/mail partition. ;)

As this command recurses subdirectories, it’s also good for running a periodic report on home directory usage:

root@magic:/home# ducks
[ several seconds later ]
13197880 total
2266480 ferris
1877064 valis
1692660 hagbard
1338992 raw
1137024 nick
1001576 rob
925620 phil
870552 shared
607740 mal
564628 eris

For running simple spot checks while looking for disk hogs, ducks can save many keystrokes (although if we called it something like ds, it would save even more, but wouldn’t be nearly as funny.)

Fun with /proc

Directly view the kernel’s running process table and system variables

The /proc filesystem contains a representation of the kernel’s live process table. By manipulating files and directories in /proc, you can learn about (and fiddle with) all sorts of parameters in the running system. Be warned that poking around under /proc as root can be extraordinarily dangerous, as root has the power to overwrite virtually anything in the process table. One slip of a redirector, and Linux will very obligingly blow away your entire kernel memory, without so much as a “so long and thanks for all the kcore.”

Here are some examples of interesting things to do with /proc. In these examples, we’ll assume that you’re using a recent kernel (about 2.4.18 or so) and that you are logged in as root. Unless you’re root, you will only be able to view and modify processes owned by your uid.

First, let’s take a look at a lightly loaded machine:

root@catlin:/proc# ls
1/ 204/ 227/ 37/ bus/ hermes/ loadavg scsi/ version
1039/ 212/ 228/ 4/ cmdline ide/ locks self@
1064/ 217/ 229/ 5/ cpuinfo interrupts meminfo slabinfo
1078/ 220/ 230/ 6/ devices iomem misc stat
194/ 222/ 231/ 698/ dma ioports modules swaps
197/ 223/ 232/ 7/ driver/ irq/ mounts sys/
2/ 224/ 233/ 826/ execdomains kcore net/ sysvipc/
200/ 225/ 254/ 827/ filesystems kmsg partitions tty/
202/ 226/ 3/ apm fs/ ksyms pci uptime

The directories consisting of numbers contain information about every process running on the system. The number corresponds to the PID. The rest of the files and directories correspond to drivers, counters, and many other internals of the running kernel. The interface operates just as any other file or device in the system, by reading from and writing to each entry as if it were a file. Suppose you want to find out which kernel is currently booted:

root@catlin:/proc# cat version
Linux version 2.4.18 (root@catlin) (gcc version 2.95.3 20010315 (release)) 
#2 Sat Jun 22 19:01:17 PDT 2002

Naturally, you could find much of that out by simply running uname -a, but this way cuts out the middle man (and actually does what uname does internally).

Interested in how much RAM we have installed? Take a look at kcore:

root@catlin:/proc# ls -l kcore
-r-------- 1 root root 201330688 Aug 28 21:39 kcore

Looks like we have 192MB installed in this machine (201330688/1024/1024 == 192, more or less). Notice the restricted file permissions? That is the system’s defense against anyone attempting to read the memory directly. Of course, you’re root and can do whatever you like. There’s nothing preventing you from running grep or strings on kcore and looking for interesting tidbits. This is the system memory, and things get cached in there until overwritten (making it possible to hunt down accidentally lost data or track down naughty people doing things they oughtn’t). Grovelling over kcore is actually not much fun and is typically only a method used by the very desperate (or very bored).

Some other notable status files:

root@catlin:/proc# cat interrupts 
CPU0 
0: 34302408 XT-PIC timer
1: 2 XT-PIC keyboard
2: 0 XT-PIC cascade
3: 289891 XT-PIC orinoco_cs
8: 1 XT-PIC rtc
9: 13933886 XT-PIC eth0
10: 25581 XT-PIC BusLogic BT-958
14: 301982 XT-PIC ide0
NMI: 0 
ERR: 0

These are the system counters for every interrupt that has ever been called, its number, and the driver that called it.

root@catlin:/proc# cat partitions 
major minor #blocks name

8 0 8971292 sda
8 1 8707198 sda1
8 2 257040 sda2
3 64 29316672 hdb
3 65 29310561 hdb1

This is a list of all of the hard disk partitions (and devices) that were discovered at boot, along with their respective sizes. Here we have a 9GB SCSI disk, and a 30GB IDE disk. All available partitions are represented here, regardless of whether they are currently mounted (making it a handy reference to see if there are any unmounted disks on the system).

Let’s leave the system parameters and take a look at the structure of an individual process.

root@catlin:/proc# cd 1
root@catlin:/proc/1# ls -l
total 0
-r--r--r-- 1 root root 0 Aug 28 22:05 cmdline
lrwxrwxrwx 1 root root 0 Aug 28 22:05 cwd -> //
-r-------- 1 root root 0 Aug 28 22:05 environ
lrwxrwxrwx 1 root root 0 Aug 28 22:05 exe -> /sbin/init*
dr-x------ 2 root root 0 Aug 28 22:05 fd/
-r--r--r-- 1 root root 0 Aug 28 22:05 maps
-rw------- 1 root root 0 Aug 28 22:05 mem
lrwxrwxrwx 1 root root 0 Aug 28 22:05 root -> //
-r--r--r-- 1 root root 0 Aug 28 22:05 stat
-r--r--r-- 1 root root 0 Aug 28 22:05 statm
-r--r--r-- 1 root root 0 Aug 28 22:05 status

There are three interesting symlinks in this directory. cwd points to the current working directory of this process (you can, for example, cd /proc/3852/cwd to land in the directory from which process ID 3852 was run.) The exe link points to full path to the binary that was called, and root points to the notion of the root directory that this process has. The root link will almost always be /, unless it has executed a chroot.

The cmdline and environ files contain the command line as it was originally called and the processes complete environment. These are separated by NULL characters, so to see them in a more human readable form, try this:

root@catlin:/proc/1# cat environ |tr '\0' '\n'
HOME=/
TERM=linux
BOOT_IMAGE=catlin

(or for a better example, try this on /proc/self/environ, the environment of the currently running process):

root@catlin:/proc/1# cat /proc/self/environ |tr '\0' '\n'
PWD=/proc/1
HOSTNAME=catlin.nocat.net
MOZILLA_HOME=/usr/lib/netscape
ignoreeof=10
LS_OPTIONS= --color=auto -F -b -T 0
MANPATH=/usr/local/man:/usr/man:/usr/X11R6/man
LESSOPEN=|lesspipe.sh %s
PS1=\u@\h:\w\$ 
PS2=> 
...

and so on. This can be tremendously handy for use in shell scripts (or other programs) where you need specific information about running processes. Just use it with care, and remember that unprivileged users can usually only access information about their own processes.

In closing, here’s a practical example of one use for /proc . By checking the output of ps against the running process table, you can see if the two agree:

# ls -d /proc/* |grep [0-9]|wc -l; ps ax |wc -l

This will give you a quick spot check of the number of running processes versus the number that ps actually reports. Many rootkits install a hacked ps that allows a miscreant to hide processes (by simply not displaying them when ps runs). You may hide from ps, but it’s much more difficult to hide from /proc. If the second number is considerably larger than the first (particularly if you run it several times in a row) then you might want to consider taking your box offline for further inspection. Quickly.

Manipulating Processes Symbolically with procps

Signal and renice processes by name, terminal, or username (instead of PID)

If you often find yourself running a ps awux |grep something just to find the PID of a job you’d like to kill, then you should take a look at some of the more modern process manipulation packages.

Probably the best known process tools package is procps , the same package that includes the Linux version of the ubiquitous top command. The top tool is so tremendously useful and flexible that it deserves its own discussion. You can learn more about the top tool in [Hack #58].

Among the other nifty utilities included in procps:

skill lets you send signals to processes by name, terminal, username, or PID. snice does the same but renices processes instead of sending them signals.

For example, to freeze the user on terminal pts/2:

# skill -STOP pts/2

To release them from the grip of sleeping death, try this:

# skill -CONT pts/2

Or to renice all of luser’s processes to 5:

# snice +5 luser

pkill is similar to skill, but with more formal parameters. Rather than attempting to guess whether you are referring to a username, process name, or terminal, you specify them explicitly with switches. For example, these two commands do the same thing:

# skill -KILL rob bash
# pkill -KILL -u rob bash

pkill may take slightly more typing, but is guaranteed to be unambiguous (assuming that you happened to have a user and a process with the same name, for example).

pgrep works just like pkill, but instead of sending a signal to each process, it simply prints the matching PID(s) on STDOUT:

$ pgrep httpd
3211
3212
3213
3214
3215
3216

Finally, vmstat gives you a nice, easily parsed pinpoint measurement of virtual memory and cpu statistics:

$ vmstat 
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 5676 6716 35804 58940 0 0 9 9 7 9 0 0 29

If you’d like to watch how usage changes over time, give it a number on the command line. The number represents the number of seconds to pause before displaying the results of another measurement.

Learning how to use the lesser known procps utilities can save you lots of typing, not to mention wincing. Use skill or pkill to avoid accidentally mistyping a PID argument to kill, and suddenly bringing sshd (and the wrath of many angry users) down upon your unfortunate sysadmin head.

See also:

Managing System Resources per Process

Prevent user processes from running away with all system resources

Whether intentionally or accidentally, it is entirely possible for a single user to use up all system resources, leading to poor performance or outright system failure. One frequently overlooked way to deal with resource hogs is to use the ulimit functionality of bash.

To prevent a process (or any of its children) from creating enormous files, try specifying a ulimit -f (with the maximum file size specified in kilobytes).

rob@catlin:/tmp$ ulimit -f 100
rob@catlin:/tmp$ yes 'Spam spam spam spam SPAM!' > spam.txt
File size limit exceeded
rob@catlin:/tmp$ ls -l spam.txt
-rw-r--r-- 1 rob users 102400 Sep 4 17:05 spam.txt
rob@catlin:/tmp$

Users can decrease their own limits, but not increase them (as with nice and renice). This means that ulimits set in /etc/profile cannot be increased later by users other than root:

rob@catlin:/tmp$ ulimit -f unlimited
bash: ulimit: cannot modify limit: Operation not permitted

Note that nothing is preventing a user from creating many files, all as big as their ulimit allows. Users with this particular temperament should be escorted to a back room and introduced to your favorite LART. Or alternately, you could look into introducing disk quotas (although this is usually less than fun to implement, if a simple stern talking to will fix the problem).

Likewise, ulimit can limit the maximum number of children that a single user can spawn:

rob@catlin:~$ cat > lots-o-procs
#!/bin/bash
export RUN=$((RUN + 1))
echo $RUN...
$0
^D
rob@catlin:~$ ulimit -u 10
rob@catlin:~$ ./lots-o-procs
1...
2...
3...
4...
5...
6...
7...
8...
9...
./lots-o-procs: fork: Resource temporarily unavailable
rob@catlin:~$

This limits the number of processes for a single user across all terminals (and back grounded processes). It has to be this way, because once a process is forked, it disassociates itself from the controlling terminal. (And how would you count it against a given subshell then?)

One other very useful ulimit option is -v, maximum virtual memory size. Once this ceiling is reached, processes will exit with Segmentation fault (which isn’t ideal, but will keep the system from crashing as it runs out of RAM and swap). If you have a particularly badly behaving process that shows significant bloat (like Apache + mod_perl and poorly written CGI code, for example) you could set a ulimit to act as an “emergency brake” while debugging the real source of the trouble. Again, specify the limit in kilobytes.

To see all available ulimit settings, use -a:

rob@catlin:~$ ulimit -a 
core file size (blocks) 0
data seg size (kbytes) unlimited
file size (blocks) unlimited
max locked memory (kbytes) unlimited
max memory size (kbytes) unlimited
open files 1024
pipe size (512 bytes) 8
stack size (kbytes) 8192
cpu time (seconds) unlimited
max user processes 1536
virtual memory (kbytes) unlimited

You can see that before setting system-wide hard limits, user processes can grow to be quite large. In tcsh, the analogous command you’re after is limit:

rob@catlin:~> limit
cputime unlimited
filesize unlimited
datasize unlimited
stacksize 8192 kbytes
coredumpsize 0 kbytes
memoryuse unlimited
descriptors 1024 
memorylocked unlimited
maxproc 1536 
openfiles 1024

Setting system resource limits may sound draconian but is a much better alternative to the downward spiral of a user process gone amok.

Cleaning Up after Ex-Users

Make sure you close the door all the way when a user takes their leave

It happens. Plans change, companies shift focus, and people move on. At some point, every admin has had to clean up shell access after someone has left the company, happily or otherwise.

I am personally of the opinion that if one doesn’t trust one’s users from the beginning, then they will one day prove themselves untrustworthy. (Of course, I’ve never had to admin an open shell server at an ISP, either.) At any rate, building trust with your users from the beginning will go a long way toward being able to sleep at night later on.

When you do have to lock old accounts up, it’s best to proceed strategically. Don’t assume that just because you ran a passwd -l that the user in question can’t regain access to your machine. Let’s assume that we’re locking up after an account called luser. Here are some obvious (and some not so obvious) things to check on in the course of cleaning up:

passwd -l luser

Obviously, locking the user’s password is a good first step.

chsh -s /bin/true luser

This is another popular step, changing the user’s login shell to something that exits immediately. This generally prevents a user from gaining shell access to the server. But be warned, if sshd is running on this box, and you allow remote RSA or DSA key authentication, then luser can still forward ports to any machine that your server can reach! With a command like this:

luser@evil:~$ ssh -f -N -L8000:private.intranet.server.com:80 old.server.com

luser has just forwarded his local port 8000 to your internal intranet server’s http port. This is allowed since luser isn’t using a password (he is using an RSA key) and isn’t attempting to execute a program on old.server.com (since he specified the -N switch).

Obviously, you should remove ~luser/.ssh/authorized_keys* and prevent luser from using his ssh key in the first place. Likewise, look for either of these files:

~luser/.shosts
~luser/.rhosts

This usually isn’t a problem unless you’re running rsh, or have enabled this functionality in ssh. But you never know if a future admin will decide to enable .shosts or .rhosts functionality, so it’s better to remove them now, if they exist.

Did luser have any sudo privileges? Check visudo to be sure.

How about cron jobs or at jobs?

crontab -u luser -e 
atq

For that matter, is luser running any jobs right now?

ps awux |grep -i ^luser

or as in [Hack #17], you can use:

 skill -KILL luser

Could luser execute cgi programs from his home directory (or somewhere else)?

find ~luser/public_html/ -perm +111

What about PHP or other embedded scripting languages?

find ~luser ~public_html/ -name '*.php*'

Does luser have any email forwarding set up? Forwarders can frequently be made to execute arbitrary programs.

less ~luser/.forward
grep -C luser /etc/mail/aliases

Finally, does luser own any files in strange places?

find / -user luser > ~root/luser-files.report

One safe (and quick) way of ensuring that all of luser’s personal configuration files are invalidated is to mv /home/luser /home/luser.removed. This will keep the contents of luser’s home directory intact, without worrying about having missed other possible points of entry. Note that this will break a legitimate .forward file (and also ~luser/public_html and any other publicly accessible data that luser might have kept in his home directory), so if you go this route be sure to take that into account (say, by adding an appropriate entry to the system aliases file, and moving a cleaned version of public_html/ back to /home/luser/public_html/.

Look at configuration files for any system software to which luser had access, particularly services that run as privileged users. Even something as innocuous as a user-supplied Apache configuration file could be used to provide shell access later.

This list is by no means exhaustive but is meant to demonstrate that there is a lot more to revoking access than simply locking a user’s password. If this user ever had root access, all bets are off. Access could later be granted by anything from a Trojan system binary to an invisible kernel module to having simply changed the root password.

Get to know your shell users long before the day you have to say goodbye. This job is difficult enough without having to worry about whom you can trust.

Eliminating Unnecessary Drivers from the Kernel

Keep your kernel optimized for quick booting and long-term stability

Linux will run on an enormous variety of computer hardware. There is support for all manner of hardware, from critical components (such as hard disk drives and RAM) to more exotic devices (such as USB scanners and video capture boards). The kernel that ships with most Linux distributions aims to be complete (and safe) at the expense of possibly being less efficient than it could be, by including support for as many devices as possible.

As your machine boots, take a look at the messages the kernel produces. You may find it probing for all sorts of hardware (particularly SCSI controllers and Ethernet cards) that you don’t actually have installed. If your distribution hides the kernel boot messages, try the dmesg command (probably piped through less) to see what kernel messages were generated at boot.

To make your kernel boot quickly and at the same time eliminate the possibility that an unnecessary device driver might be causing problems with your installed hardware, you should trim down the drivers that the kernel attempts to load to fit your hardware.

There are two schools of thought on managing kernel drivers. Some people prefer to build a kernel with all of the functionality they need built into it, without using loadable kernel modules. Others prefer to build a more lightweight kernel and load the drivers they need when the system boots. Of course, both methods have their advantages and disadvantages.

For example, the monolithic kernel (without loadable modules) is guaranteed to boot, even if something happens to the drivers under /lib/modules. Some admins even prefer to build a kernel with no loadable module support at all, to discourage the possibility of Trojan horse device drivers being loaded by a random miscreant down the road. On the other hand, if a new piece of hardware is added to the system, then you will need to rebuild your kernel to accommodate it.

If you use loadable modules, then you have enormous flexibility in how you load device drivers. You can alter the order that drivers load and even pass parameters to various modules as you load the driver, all without rebooting. The downside is that good copies of the modules must exist under /lib/modules, or else the system can’t load its drivers. When you build a new kernel, you’ll need to remember to run a make modules_install, and to keep your kernel module utilities (like modprobe and lsmod) up to date. Building a kernel with loadable modules can also help if your monolithic kernel is too large to boot (by loading extra device drivers after the kernel boots and the filesystems are mounted).

Regardless of the method that you choose for your system, you’ll need to build a kernel with the minimal functionality that is required to boot. This includes drivers for the IDE, SCSI, or other bus (maybe floppy disk or even network card?) that your machine boots from. You will also need support for the filesystem that your root partition is installed on (likely ext2, ext3, or reiserfs). Make sure that you build a kernel with support for the amount of RAM that your machine has installed (see [Hack #21]). Select a processor that matches your hardware to be sure that all appropriate optimizations are turned on. Be sure to build an SMP kernel if you have more than one processor.

To build a customized kernel, unpack the kernel sources somewhere that has enough room (say, in /usr/local/src/linux). Run a make menuconfig (or, if you’re running X, make xconfig). Select your drivers carefully (hitting Y for built-in drivers, and M for loadable modules.) Remember that you’re building a kernel for a server and don’t include extraneous drivers. Does your server really need sound support, even if it is built into your motherboard? What about USB? Unless you have a specific use for a particular piece of hardware that is directly related to its job as a server, don’t bother installing a driver for it. You should also consider disabling unused hardware in the BIOS, wherever possible, to conserve system resources and reduce the possibility of hardware resource conflicts.

After you’re finished selecting which bits of the kernel to build, it will save your configuration to a file called .config at the top of the kernel source tree. Save this file for later, because it will make upgrading your kernel much easier down the road (by copying it to the top of the new kernel tree, and running make oldconfig). Now build the new kernel (and modules, if applicable), install it, and give it a try. Be sure to try out all of your devices after installing a new kernel, and watch your boot messages carefully. Track down any unexpected warnings or errors now, before they cause trouble at some future date.

Finally, if you’re doing kernel work remotely, don’t forget to build in support for your network devices! There’s nothing quite like the pain of realizing that the network drivers aren’t built just after you issue a shutdown -r now. Hopefully that console is accessible by someone on call (who will discretely boot your old kernel for you, without telling too many of your friends).

Building a kernel well-tailored to your hardware can be challenging, but is necessary to make your server run as efficiently as it can. Don’t be discouraged if it takes a couple of tries to build the perfect Linux kernel, the effort is well worth it.

See also:

  • Running Linux, Fourth Edition (O’Reilly)

  • [Hack #21]

Using Large Amounts of RAM

Be sure that Linux is using all of your available system RAM

Linux is capable of addressing up to 64 GB of physical RAM on x86 systems. But if you want to accommodate more than 960 MB RAM, you’ll have to let the system know about it.

First of all, your Linux kernel must be configured to support the additional RAM. Typically, the default kernel configuration will address up to 960 MB RAM. If you install more than that in a machine, it will simply be ignored. (The common complaint is that you’ve just installed 1 GB, and yet a `free’ only reports 960MB, even though it counts to 1024 MB at post time.)

The way that the kernel addresses its available system memory is dictated by the High Memory Support setting (a.k.a. the CONFIG_NOHIGHMEM define.) Depending on the amount of RAM you intend to use, set it accordingly:

up to 960MB: off
up to 4GB: 4GB
more than 4GB: 64GB

Be warned that selecting 64 GB requires a processor capable of using Intel Physical Address Extension (PAE) mode. According to the kernel notes, all Intel processors since the Pentium Pro support PAE, but this setting won’t work on older processors (and the kernel will refuse to boot, which is one reason that it isn’t on by default). Make your selection and rebuild your kernel, as in [Hack #20].

Once the kernel is built and installed, you may have to tell your boot loader how much RAM is installed, so it can inform the kernel at boot time (as not every BIOS is accurate in reporting the total system RAM at boot.) To do this, add the mem= kernel parameter in your bootloader configuration. For example, suppose we have a machine with 2GB RAM installed.

If you’re using Lilo, add this line to /etc/lilo.conf:

append="mem=2048M"

If you’re using Grub, try this in your /etc/grub.conf:

kernel /boot/vmlinuz-2.4.19 mem=2048M

If you’re running loadlin, just pass it on the loadlin line:

c:\loadlin c:\kernel\vmlinuz root=/dev/hda3 ro mem=2048M

Although, if you’re running loadlin, why are you reading a book on Linux Server Hacks? ;)

hdparm: Fine Tune IDE Drive Parameters

Get the best possible performance from your IDE hardware

If you’re running a Linux system with at least one (E)IDE hard drive, and you’ve never heard of hdparm, read on.

By default, most Linux distributions use the default kernel parameters when accessing your IDE controller and drives. These settings are very conservative and are designed to protect your data at all costs. But as many have come to discover, safe almost never equals fast. And with large volume data processing applications, there is no such thing as “fast enough.”

If you want to get the most performance out of your IDE hardware, take a look at the hdparm(8) command. It will not only tell you how your drives are currently performing, but will let you tweak them to your heart’s content.

It is worth pointing out that under some circumstances, these commands CAN CAUSE UNEXPECTED DATA CORRUPTION! Use them at your own risk! At the very least, back up your box and bring it down to single-user mode before proceeding.

Let’s begin. Now that we’re in single user mode (which we discussed in [Hack #2]), let’s find out how well the primary drive is currently performing:

hdparm -Tt /dev/hda

You should see something like:

/dev/hda:
Timing buffer-cache reads: 128 MB in 1.34 seconds =95.52 MB/sec
Timing buffered disk reads: 64 MB in 17.86 seconds = 3.58 MB/sec

What does this tell us? The -T means to test the cache system (i.e., the memory, CPU, and buffer cache). The -t means to report stats on the disk in question, reading data not in the cache. The two together, run a couple of times in a row in single-user mode, will give you an idea of the performance of your disk I/O system. (These are actual numbers from a PII/350/128M Ram/EIDE HD; your numbers will vary.)

But even with varying numbers, 3.58 MB/sec is pathetic for the above hardware. I thought the ad for the HD said something about 66 MB per second!!?!? What gives?

Let’s find out more about how Linux is addressing this drive:

# hdparm /dev/hda

/dev/hda:
multcount = 0 (off)
I/O support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 0 (off)
keepsettings = 0 (off)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 1870/255/63, sectors = 30043440, start = 0

These are the defaults. Nice, safe, but not necessarily optimal. What’s all this about 16-bit mode? I thought that went out with the 386!

These settings are virtually guaranteed to work on any hardware you might throw at it. But since we know we’re throwing something more than a dusty, 8-year-old, 16-bit multi-IO card at it, let’s talk about the interesting options:

multcount

Short for multiple sector count. This controls how many sectors are fetched from the disk in a single I/O interrupt. Almost all modern IDE drives support this. The manpage claims:

When this feature is enabled, it typically reduces operating system overhead for disk I/O by 30-50%. On many systems, it also provides increased data throughput of anywhere from 5% to 50%.

I/O support

This is a big one. This flag controls how data is passed from the PCI bus to the controller. Almost all modern controller chipsets support mode 3, or 32-bit mode w/sync. Some even support 32-bit async. Turning this on will almost certainly double your throughput (see below).

unmaskirq

Turning this on will allow Linux to unmask other interrupts while processing a disk interrupt. What does that mean? It lets Linux attend to other interrupt-related tasks (i.e., network traffic) while waiting for your disk to return with the data it asked for. It should improve overall system response time, but be warned: not all hardware configurations will be able to handle it. See the manpage.

using_dma

DMA can be a tricky business. If you can get your controller and drive using a DMA mode, do it. However, I have seen more than one machine hang while playing with this option. Again, see the manpage.

Let’s try out some turbo settings:

# hdparm -c3 -m16 /dev/hda

/dev/hda:
setting 32-bit I/O support flag to 3
setting multcount to 16
multcount = 16 (on)
I/O support = 3 (32-bit w/sync)

Great! 32-bit sounds nice. And some multi-reads might work. Let’s re-run the benchmark:

# hdparm -tT /dev/hda

/dev/hda:
Timing buffer-cache reads: 128 MB in 1.41 seconds =90.78 MB/sec
Timing buffered disk reads: 64 MB in 9.84 seconds = 6.50 MB/sec

Hmm, almost double the disk throughput without really trying! Incredible.

But wait, there’s more: we’re still not unmasking interrupts, using DMA, or even a using decent PIO mode! Of course, enabling these gets riskier. The manpage mentions trying Multiword DMA mode2, so let’s try this:

# hdparm -X34 -d1 -u1 /dev/hda

Unfortunately this seems to be unsupported on this particular box (it hung like an NT box running a Java application) So, after rebooting it (again in single-user mode), I went with this:

# hdparm -X66 -d1 -u1 -m16 -c3 /dev/hda

/dev/hda:
setting 32-bit I/O support flag to 3
setting multcount to 16
setting unmaskirq to 1 (on)
setting using_dma to 1 (on)
setting xfermode to 66 (UltraDMA mode2)
multcount = 16 (on)
I/O support = 3 (32-bit w/sync)
unmaskirq = 1 (on)
using_dma = 1 (on)

And then checked:

# hdparm -tT /dev/hda

/dev/hda:
Timing buffer-cache reads: 128 MB in 1.43 seconds =89.51 MB/sec
Timing buffered disk reads: 64 MB in 3.18 seconds =20.13 MB/sec

20.13 MB/sec. A far cry from the miniscule 3.58 with which we started.

Did you notice how we specified the -m16 and -c3 switch again? That’s because it doesn’t remember your hdparm settings between reboots. Be sure to add the above line to your /etc/rc.d/* scripts once you’re sure the system is stable (and preferably after your fsck runs; running an extensive filesystem check with your controller in a flaky mode may be a good way to generate vast quantities of entropy, but it’s no way to administer a system. At least not with a straight face).

If you can’t find hdparm on your system (usually in /sbin or /usr/sbin), get it from the source at http://metalab.unc.edu/pub/Linux/system/hardware/.

Get Linux Server Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.