BUY THIS BOOK
Add to Cart

Print Book $69.95


Add to Cart

Print+PDF $90.94

Add to Cart

PDF $55.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £49.95

What is this?

Looking to Reprint or License this content?


Unix Power Tools
Unix Power Tools, Third Edition

By Shelley Powers, Jerry Peek, Tim O'Reilly, Mike Loukides
Book Price: $69.95 USD
£49.95 GBP
PDF Price: $55.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction
If we were writing about any other operating system, "power tools" might mean "nifty add-on utilities to extend the power of your operating system." That sounds suspiciously like a definition of Unix: an operating system loaded with decades' worth of nifty add-on utilities.
Unix is unique in that it wasn't designed as a commercial operating system meant to run application programs, but as a hacker's toolset, by and for programmers. In fact, an early release of the operating system went by the name PWB (Programmer's Work Bench).
When Ken Thompson and Dennis Ritchie first wrote Unix at AT&T Bell Labs, it was for their own use and for their friends and coworkers. Utility programs were added by various people as they had problems to solve. Because Bell Labs wasn't in the computer business, source code was given out to universities for a nominal fee. Brilliant researchers wrote their own software and added it to Unix in a spree of creative anarchy, which has been equaled only with Linux, in the introduction of the X Window System (Section 1.22), and especially the blend of Mac and Unix with Darwin included in the Mac OS X.
Unlike most other operating systems, where free software remains an unsupported add-on, Unix has taken as its own the work of thousands of independent programmers. During the commercialization of Unix within the past several years, this incorporation of outside software has slowed down for larger Unix installations, such as Sun's Solaris and HP's hp-ux, but not stopped entirely. This is especially true with the newer lighter versions of Unix, such as the various flavors of Linux and Darwin.
Therefore, a book on Unix inevitably has to focus not just on add-on utilities (though we do include many of those), but on how to use clever features of the many utilities that have been made part of Unix over the years.
Unix is also important to power users because it's one of the last popular operating systems that doesn't force you to work behind an interface of menus, windows, and mouse with a "one-size(-doesn't)-fit-all" programming interface. Yes, you can use Unix interfaces with windows and menus — and they can be great time savers in a lot of cases. But Unix also gives you building blocks that, with some training and practice, will give you many more choices than any software designer can cram onto a set of menus. If you learn to use Unix and its utilities from the command line, you don't have to be a programmer to do very powerful things with a few keystrokes.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What's Special About Unix?
If we were writing about any other operating system, "power tools" might mean "nifty add-on utilities to extend the power of your operating system." That sounds suspiciously like a definition of Unix: an operating system loaded with decades' worth of nifty add-on utilities.
Unix is unique in that it wasn't designed as a commercial operating system meant to run application programs, but as a hacker's toolset, by and for programmers. In fact, an early release of the operating system went by the name PWB (Programmer's Work Bench).
When Ken Thompson and Dennis Ritchie first wrote Unix at AT&T Bell Labs, it was for their own use and for their friends and coworkers. Utility programs were added by various people as they had problems to solve. Because Bell Labs wasn't in the computer business, source code was given out to universities for a nominal fee. Brilliant researchers wrote their own software and added it to Unix in a spree of creative anarchy, which has been equaled only with Linux, in the introduction of the X Window System (Section 1.22), and especially the blend of Mac and Unix with Darwin included in the Mac OS X.
Unlike most other operating systems, where free software remains an unsupported add-on, Unix has taken as its own the work of thousands of independent programmers. During the commercialization of Unix within the past several years, this incorporation of outside software has slowed down for larger Unix installations, such as Sun's Solaris and HP's hp-ux, but not stopped entirely. This is especially true with the newer lighter versions of Unix, such as the various flavors of Linux and Darwin.
Therefore, a book on Unix inevitably has to focus not just on add-on utilities (though we do include many of those), but on how to use clever features of the many utilities that have been made part of Unix over the years.
Unix is also important to power users because it's one of the last popular operating systems that doesn't force you to work behind an interface of menus, windows, and mouse with a "one-size(-doesn't)-fit-all" programming interface. Yes, you can use Unix interfaces with windows and menus — and they can be great time savers in a lot of cases. But Unix also gives you building blocks that, with some training and practice, will give you many more choices than any software designer can cram onto a set of menus. If you learn to use Unix and its utilities from the command line, you don't have to be a programmer to do very powerful things with a few keystrokes.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Power Grows on You
It has been said that Unix is not an operating system as much as it is a way of thinking. In The UNIX Programming Environment, Kernighan and Pike write that at the heart of the Unix philosophy "is the idea that the power of a system comes more from the relationships among programs than from the programs themselves."
Most of the nongraphical utility programs that have run under Unix since the beginning, some 30 years ago, share the same user interface. It's a minimal interface, to be sure — but one that allows programs to be strung together in pipelines to do jobs that no single program could do alone.
Most operating systems — including modern Unix and Linux systems — have graphical interfaces that are powerful and a pleasure to use. But none of them are so powerful or exciting to use as classic Unix pipes and filters, and the programming power of the shell.
A new user starts by stringing together simple pipelines and, when they get long enough, saving them for later execution in a file (Section 1.8), alias (Section 29.2), or function (Section 29.11). Gradually, if the user has the right temperament, he gets the idea that the computer can do more of the boring part of many jobs. Perhaps he starts out with a for loop (Section 28.9) to apply the same editing script to a series of files. Conditions and cases soon follow and before long, he finds himself programming.
On most systems, you need to learn consciously how to program. You must take up the study of one or more programming languages and expend a fair amount of concentrated effort before you can do anything productive. Unix, on the other hand, teaches programming imperceptibly — it is a slow but steady extension of the work you do simply by interacting with the computer.
Before long, you can step outside the bounds of the tools that have already been provided by the designers of the system and solve problems that don't quite fit the mold. This is sometimes called hacking; in other contexts, it is called "engineering." In essence, it is the ability to build a tool when the right one is not already on hand.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Core of Unix
In recent times, more attention has been paid on the newer and more lightweight varieties of Unix: FreeBSD, Linux, and now Darwin — the version of BSD Unix that Apple used as the platform for the new Mac OS X. If you've worked with the larger Unix versions, you might be curious to see how it differs within these new environments.
For the most part, basic Unix functionality differs very little between implementations. For instance, I've not worked with a Unix box that doesn't have vi (Section 21.7) installed. Additionally, I've also not found any Unix system that doesn't have basic functionality, such as traversing directories with cd (Section 1.16) or getting additional help with man (Section 2.1).
However, what can differ between flavors of Unix is the behavior of some of the utilities and built-in commands, as well as the options. Even within a specific Unix flavor, such as FreeBSD, installations can differ because one installation uses the built-in version of a utility such as make (Section 40.3) and another installation has a GNU version of the same application.
An attempt was made to create some form of standardization with the POSIX effort. POSIX, which stands for Portable Operating System Interface, is an IEEE standard to work towards application interoperability. With this, C programs written on one flavor of Unix should work, with minimum modification, on another flavor of Unix.
Unfortunately, though the POSIX effort has had some impact on interoperability, there still are significant differences between Unix versions. In particular, something such as System V Unix can differ considerably from something such as Darwin.
However, there is stability in this seeming chaos: for the most part, the basic Unix utilities and commands behave the same in all Unix flavors, and aside from some optional differences, how a command works within one environment is exactly the same as in another environment. And if there are differences, using the facilities described in Chapter 2 should help you resolve these quickly.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Communication with Unix
Probably the single most important concept for would-be power users to grasp is that you don't "talk" directly to the Unix operating system. Instead, you talk to a program — and that program either talks to Unix itself or it talks to another program that talks to Unix. (When we say "talk" here, we mean communication using a keyboard and a mouse.)
There are three general kinds of programs you'll probably "talk" to:
  • The program called the shell (Section 27.1). A shell is a command interpreter. Its main job is to interpret the commands you type and to run the programs you specify in your command lines. By default, the shell reads commands from your tty and arranges for other programs to write their results there. The shell protects Unix from the user (and the user from Unix). It's the main focus of this book (and the rest of this article).
  • An interactive command, running "inside" a tty, that reads what you type directly. These take input directly from the user, without intervention from the shell. The shell's only job is to start them up. A text editor, a mail program, or almost any application program (such as word processing) includes its own command interpreter with its own rules. This book covers a few interactive commands — such as the vi editor — but its main focus is the shell and "noninteractive" utilities that the shell coordinates to do what needs doing.
  • A Graphical User Interface (GUI) with a desktop, windows, and so on. On Unix, a GUI is implemented with a set of running programs (all of which talk to Unix for you).
    Unix was around long before GUIs were common, and there's no need to use a GUI to use Unix. In fact, Unix started in the days of teletypes, those clattering printing devices used to send telegrams. Unix terminals are still referred to as teletypes or ttys (Section 2.7).
The core of the Unix operating system is referred to as
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Programs Are Designed to Work Together
As pointed out by Kernighan and Pike in The UNIX Programming Environment, there are a number of principles that distinguish the Unix environment. One key concept is that programs are tools. Like all good tools, they should be specific in function, but usable for many different purposes.
In order for programs to become general-purpose tools, they must be data independent. This means three things:
  1. Within limits, the output of any program should be usable as the input to another.
  2. All of the information needed by a program should be either contained in the data stream passed to it or specified on the command line. A program should not prompt for input or do unnecessary formatting of output. In most cases, this means that Unix programs work with plain text files that don't contain "nonprintable" or "control" characters.
  3. If no arguments are given, a program should read the standard input (usually the terminal keyboard) and write the standard output (usually the terminal screen).
Programs that can be used in this way are often called filters.
One of the most important consequences of these guidelines is that programs can be strung together in "pipelines" in which the output of one program is used as the input of another. A vertical bar (|) represents pipe and means "take the output of the program on the left and feed it into the program on the right."
For example, you can pipe the output of a search program to another program that sorts the output, and then pipe the result to the printer program or redirect it to a file (Section 43.1).
Not all Unix programs work together in this way. An interactive program like the Emacs editor (Section 19.1) generally doesn't read from or write to pipes you'd create on the command line. Instead, once the shell has started Emacs, the editor works independently of the shell (Section 1.4), reading its input and output directly from the terminal. And there are even exceptions to this exception. A program like less
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
There Are Many Shells
With most operating systems, the command intepreter is built in; it is an integral part of the operating system. With Unix, your command interpreter is just another program. Traditionally, a command interpreter is called a "shell," perhaps because it protects you from the underlying kernel — or because it protects the kernel from you!
In the early 1980s, the most common shells were the Bourne shell (sh) and the C shell (csh). The Bourne shell (Section 3.3) (named after its creator, Steve Bourne) came first. It was excellent for shell programming (Section 1.8). But many Unix users (who were also writing programs in the C language) wanted a more familiar programming syntax — as well as more features for interactive use. So the C shell came from Berkeley as part of their Unix implementation. Soon (on systems that gave you the choice, at least) csh was much more popular for interactive use than sh. The C shell had a lot of nice features that weren't available in the original Bourne shell, including job control (Section 23.1) and history (Section 30.2). However, it wasn't hard for a shell programmer or an advanced user to push the C shell to its limits.
The Korn shell (also named after its creator, David Korn) arrived in the mid-1980s. The ksh is compatible with the Bourne shell, but has most of the C shell's features plus features like history editing (Section 30.14), often called command-line editing. The Korn shell was available only with a proprietary version of Unix, System V — but now a public-domain version named pdksh is widely available.
These days, most original C shell users have probably switched to tcsh (pronounced "T-shell"). It has all the features of csh and more — as well as fewer mis-features and outright bugs.
The "Bourne-again" shell, bash, is from the Free Software Foundation. It's fairly similar to the Korn shell. It has most of the C shell's features, plus command-line editing and a built-in help command. The programming syntax, though, is much more like the original Bourne shell — and many systems (including Linux) use
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Which Shell Am I Running?
You can usually tell which family your shell belongs to by a character in the prompt it displays. Bourne-type shells, such as bash , usually have $ in the prompt. The C shell uses % (but tcsh users often use >).
If your shell has superuser (Section 1.18) privileges, though, the prompt typically ends with a hash, #.
To check the shell that runs automatically when you log in to Unix, type one of these commands (the second is for systems that use NIS, Sun's Network Information Service, to manage network-wide files):
% grep
               yourloginname /etc/passwd
% ypmatch 
               yourloginname passwd
You should get back the contents of your entry in the system password file. For example:
shelleyp:*:1006:1006:Shelley Powers:/usr/home/shelleyp:/usr/local/bin/bash
The fields are separated by colons, and the default shell is usually specified in the last field.
Note that in Mac OS X, passwords are managed and stored in Netinfo by default. To store the passwords in /etc/passwd, you'll need to configure this using Netinfo.
—TOR and SP
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Anyone Can Program the Shell
One of the really wonderful things about the shell is that it doesn't just read and execute the commands you type at a prompt. The shell is a complete programming language.
The ease of shell programming is one of the real highlights of Unix for novices. A shell program need be no more than a single complex command line saved in a file — or a series of commands.
For example, let's say that you occasionally need to convert a Macintosh Microsoft Word file for use on your Unix system. Word lets you save the file in ASCII format. But there's a catch: the Mac uses a carriage return ASCII character 015 to mark the end of each line, while Unix uses a linefeed (ASCII 012). As a result, with Unix, the file looks like one long paragraph, with no end in sight.
That's easy to fix: the Unix tr (Section 21.11) command can convert every occurrence of one character in a file to another:
bash-2.04$ tr '\015' '\012' < 
               file.mac
                > 
               file.unix
            
But you're a novice, and you don't want to remember this particular piece of magic. Fine. Save the first part of this command line in a file called mac2unix in your personal bin directory (Section 7.4):
tr '\015' '\012'
Make the file executable with chmod (Section 50.5):
bash-2.04$ chmod +x mac2unix
            
Now you can say:
bash-2.04$ mac2unix < 
               file.mac
                > 
               file.unix
            
But why settle for that? What if you want to convert a bunch of files at once? Easy. The shell includes a general way of referring to arguments passed to a script and a number of looping constructs. The script:
for Section 35.21, $x Section 35.9
for x
do
    echo "Converting $x"
    tr '\015' '\012' < "$x" > "tmp.$x"
    mv "tmp.$x" "$x"
done
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Internal and External Commands
Some commands that you type are internal, which means they are built into the shell, and it's the shell that performs the action. For example, the cd command is built-in. The ls command, on the other hand, is an external program stored in the file /bin/ls.
The shell doesn't start a separate process to run internal commands. External commands require the shell to fork and exec (Section 27.2) a new subprocess (Section 24.3); this takes some time, especially on a busy system.
When you type the name of a command, the shell first checks to see if it is a built-in command and, if so, executes it. If the command name is an absolute pathname ( Section 1.16) beginning with /, like /bin/ls, there is no problem: the command is likewise executed. If the command is neither built-in nor specified with an absolute pathname, most shells (except the original Bourne shell) will check for aliases (Section 29.2) or shell functions (Section 29.11), which may have been defined by the user — often in a shell setup file (Section 3.3) that was read when the shell started. Most shells also "remember" the location of external commands (Section 27.6); this saves a long hunt down the search path. Finally, all shells look in the search path for an executable program or script with the given name.
The search path is exactly what its name implies: a list of directories that the shell should look through for a command whose name matches what is typed.
The search path isn't built into the shell; it's something you specify in your shell setup files.
By tradition, Unix system programs are kept in directories called /bin and /usr/bin, with additional programs usually used only by system administrators in either /etc and /usr/etc or /sbin and /usr/sbin. Many versions of Unix also have programs stored in /usr/ucb (named after the University of California at Berkeley, where many Unix programs were written). There may be other directories containing programs. For example, the programs that make up the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Kernel and Daemons
If you have arrived at Unix via Windows 2000 or some other personal computer operating system, you will notice some big differences. Unix was, is, and always will be a multiuser operating system. It is a multiuser operating system even when you're the only person using it; it is a multiuser operating system even when it is running on a PC with a single keyboard; and this fact has important ramifications for everything that you do.
Why does this make a difference? Well, for one thing, you're never the only one using the system, even when you think you are. Don't bother to look under your desk to see if there's an extra terminal hidden down there. There isn't. But Unix is always doing things "behind your back," running programs of its own, whether you are aware of it or not. The most important of these programs, the kernel, is the heart of the Unix operating system itself. The kernel assigns memory to each of the programs that are running, partitions time fairly so that each program can get its job done, handles all I/O (input/output) operations, and so on. Another important group of programs, called daemons, are the system's "helpers." They run continuously — or from time to time — performing small but important tasks like handling mail, running network communications, feeding data to your printer, keeping track of the time, and so on.
Not only are you sharing the computer with the kernel and some mysterious daemons, you're also sharing it with yourself. You can issue the ps x (Section 24.5) command to get a list of all processes running on your system. For example:
  PID TTY    STAT  TIME COMMAND
18034 tty2   S     0:00 -zsh
18059 ?      S     0:01 ssh-agent
18088 tty2   S     0:00 sh /usr/X11R6/bin/startx
18096 tty2   S     0:00 xinit /etc/X11/xinit/xinitrc -- :0 -auth /home/jpeek/
18101 tty2   S     0:00 /usr/bin/gnome-session
18123 tty2   S     0:33 enlightenment -clientId default2
18127 tty2   S     0:01 magicdev --sm-client-id=default12
18141 tty2   S     0:03 panel --sm-client-id default8
18145 tty2   S     0:01 gmc --sm-client-id default10
18166 ?      S     1:20 gnomepager_applet --activate-goad-server gnomepager_a
18172 tty2   S     0:01 gnome-terminal
18174 tty2   S     0:00 gnome-pty-helper
18175 pts/0  S     0:00 zsh
18202 tty2   S     0:49 gnome-terminal
18203 tty2   S     0:00 gnome-pty-helper
18204 pts/1  S     0:01 zsh
18427 pts/1  T     0:00 man zshjp
18428 pts/1  T     0:00 sh -c /bin/gunzip -c /home/jpeek/.man/cat1/zshjp.1.gz
18430 pts/1  T     0:03 /usr/bin/less -is
18914 pts/1  T     0:02 vi upt3_changes.html
 1263 pts/1  T     0:00 vi urls.html
 1511 pts/1  T     0:00 less coding
 3363 pts/1  S     0:00 vi 1007.sgm
 4844 tty2   S     0:24 /usr/lib/netscape/netscape-communicator -irix-session
 4860 tty2   S     0:00 (dns helper)
 5055 pts/1  R     0:00 ps x
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Filenames
Like all operating systems, Unix files have names. (Unix directories, devices, and so on also have filenames — and are treated like files (Section 1.19).) The names are words (sequences of characters) that let you identify a file. Older versions of Unix had some restrictions on the length of a filename (14 characters), but modern versions have removed these restrictions for all practical purposes. Sooner or later you will run into a limit, but if so, you are probably being unnecessarily verbose.
Technically, a filename can be made from almost any group of characters (including nonprinting characters and numbers) except a slash (/). However, you should avoid filenames containing most punctuation marks and all nonprinting characters. To be safe, limit your filenames to the following characters:
Upper- and lowercase characters
Unix filenames are always case sensitive. That is, upper- and lowercase letters are always different (unlike Microsoft Windows and others that consider upper- and lowercase letters the same). Therefore, myfile and Myfile are different files. It is usually a bad idea to have files whose names differ only in their capitalization, but that's your decision.
Underscores (_)
Underscores are handy for separating "words" in a filename to make them more readable. For example, my_long_filename is easier to read than mylongfilename.
Periods (.)
Periods are used by some programs (such as the C compiler) to separate filenames from filename extensions (Section 1.12). Extensions are used by these programs to recognize the type of file to be processed, but they are not treated specially by the shell, the kernel, or other Unix programs.
Filenames that begin with a period are treated specially by the shell: wildcards won't match (Section 1.13) them unless you include the period (like .*). The ls command, which lists your files, ignores files whose names begin with a period unless you give it a special option (ls -a (Section 8.9)). Special configuration files are often "hidden" in directories by beginning their names with a period.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Filename Extensions
In Microsoft Windows and some other operating systems, filenames often have the form name.extension. For example, plain text files have extensions such as .txt. The operating system treats the extension as separate from the filename and has rules about how long it must be, and so forth.
Unix doesn't have any special rules about extensions. The dot has no special meaning as a separator, and extensions can be any length. However, a number of programs (especially compilers) make use of extensions to recognize the different types of files they work with. In addition, there are a number of conventions that users have adopted to make clear the contents of their files. For example, you might name a text file containing some design notes notes.txt.
Table 1-1 lists some of the filename extensions you might see and a brief description of the programs that recognize them.
Table 1-1: Filename extensions that programs expect
Extension
Description
.a
Archive file (library)
.c
C program source file
.f
FORTRAN program source file
.F
FORTRAN program source file to preprocess
.gz
gzip ped file (Section 15.6)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Wildcards
The shells provide a number of wildcards that you can use to abbreviate filenames or refer to groups of files. For example, let's say you want to delete all filenames ending in .txt in the current directory (Section 1.16). You could delete these files one by one, but that would be boring if there were only 5 and very boring if there were 100. Instead, you can use a wildcarded name to say, "I want all files whose names end with .txt, regardless of what the first part is." The wildcard is the "regardless" part. Like a wildcard in a poker game, a wildcard in a filename can have any value.
The wildcard you see most often is * (an asterisk), but we'll start with something simpler: ? (a question mark). When it appears in a filename, the ? matches any single character. For example, letter? refers to any filename that begins with letter and has exactly one character after that. This would include letterA, letter1, as well as filenames with a nonprinting character as their last letter, such as letter^C.
The * wildcard matches any character or group of zero or more characters. For example, *.txt matches all files whose names end with .txt; c* matches all files whose names start with c; c*b* matches names starting with c and containing at least one b; and so on.
The * and ? wildcards are sufficient for 90 percent of the situations that you will find. However, there are some situations that they can't handle. For example, you may want to list files whose names end with .txt, mail, or let. There's no way to do this with a single *; it won't let you exclude the files you don't want. In this situation, use a separate * with each filename ending:
*.txt *mail *let
Sometimes you need to match a particular group of characters. For example, you may want to list all filenames that begin with digits or all filenames that begin with uppercase letters. Let's assume that you want to work with the files program.n, where n is a single-digit number. Use the filename:
program.[0123456789]
In other words, the wildcard [ character-list
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Tree Structure of the Filesystem
A multiuser system needs a way to let different users have different files with the same name. It also needs a way to keep files in logical groups. With thousands of system files and hundreds of files per user, it would be disastrous to have all of the files in one big heap. Even single-user operating systems have found it necessary to go beyond "flat" filesystem structures.
Almost every operating system solved this problem by implementing a tree-structured, or hierarchical, filesystem. Unix is no exception. A hierarchical filesystem is not much different from a set of filing cabinets at the office. Your set of cabinets consists of many individual cabinets. Each individual cabinet has several drawers; each drawer may have several partitions in it; each partition may have several hanging (Pendaflex) folders; and each hanging folder may have several files. You can specify an individual file by naming the filing cabinet, the drawer, the partition, the group of folders, and the individual folder. For example, you might say to someone: "Get me the `meeting of July 9' file from the Kaiser folder in the Medical Insurance Plans partition in the Benefits drawer of the Personnel file cabinet." This is backwards from the way you'd specify a filename, because it starts with the mfost specific part, but the idea is essentially the same.
You could give a complete path like this to any file in any of your cabinets, as shown in Figure 1-2. The concept of a "path" lets you distinguish your July 9 meeting with Kaiser from your July 9 interview with a job applicant or your July 9 policy-planning meeting. It also lets you keep related topics together: it's easy to browse through the "Medical Insurance" section of one drawer or to scan all your literature and notes about the Kaiser plan. The Unix filesystem works in exactly the same way (as do most other hierarchical filesystems). Rather than having a heap of assorted files, files are organized into directories. A directory is really nothing more than a special kind of file that lists a bunch of other files (see Section 10.2). A directory can contain any number of files (although for performance reasons, it's a good idea to keep the number of files in one directory relatively small — under 100, when you can). A directory can also contain other directories. Because a directory is nothing more than a special kind of file, directories also have names. At the top (the filesystem "tree" is really upside down) is a directory called the "root," which has the special name
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Your Home Directory
Microsoft Windows and the Mac OS have hierarchical filesystems (Section 1.14), much like those in Unix and other large systems. But there is an important difference. On many Windows and Mac systems, you start right at the "root" of the filesystem tree. In effect, you start with a blank slate and create subdirectories to organize your files.
A Unix system comes with an enormous filesystem tree already developed. When you log in, you start somewhere down in that tree, in a directory created for you by the system administrator (who may even be yourself, if you are administering your own system).
This directory — the one place in the filesystem that is your very own, to store your files (especially the shell setup files (Section 3.3) and rc files (Section 3.20) that you use to customize the rest of your environment) — is called your home directory.
Home directories were originally stored in a directory called /usr (and still are on some systems), but are now often stored in other directories, such as /home. Within the Linux Filesystem Hierarchy Standard (FHS), the home directory is always at /home, as configuration files are always in /etc and so on.
To change your current directory (Section 1.16) to your home, type cd with no pathname; the shell will assume you mean your home directory.
Within the Mac OS X environment, home is in the /Users/username directory by default.
— TOR
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Making Pathnames
Pathnames locate a file (or directory, or any other object) in the Unix filesystem. As you read this article, refer to Figure 1-4. It's a diagram of a (very) small part of a Unix filesystem.
Figure 1-4: Part of a Unix filesystem tree
Whenever you are using Unix, you have a current directory. By default, Unix looks for any mentioned files or directories within the current directory. That is, if you don't give an absolute pathname (Section 1.14) (starting from the root, / ), Unix tries to look up files relative to the current directory. When you first log in, your current directory is your home directory (Section 1.15), which the system administrator will assign to you. It typically has a name like /u/mike or /home/mike. You can change your current directory by giving the cd command, followed by the name of a new directory (for example, cd /usr/bin). You can find out your current directory by giving the pwd ("print working directory") command.
If your current directory is /home/mike and you give the command cat textfile, you are asking Unix to locate the file textfile within the directory /home/mike. This is equivalent to the absolute path /home/mike/textfile. If you give the command cat notes/textfile, you are asking Unix to locate the file textfile within the directory notes, within the current directory /home/mike.
A number of abbreviations help you to form relative pathnames more conveniently. You can use the abbreviation . (dot) to refer to the current working directory. You can use .. (dot dot) to refer to the parent of the current working directory. For example, if your current directory is /home/mike, ./textfile is the same as textfile, which is the same as /home/mike/textfile. The relative path ../gina/textfile is the same as /home/gina/textfile; .. moves up one level from /home/mike (to /home) and then searches for the directory gina and the file textfile.
You can use either the abbreviation ~ (tilde) or the environment variables $HOME or $LOGDIR, to refer to your home directory. In most shells,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
File Access Permissions
Under Unix, access to files is based on the concept of users and groups.
Every "user" on a system has a unique account with a unique login name and a unique UID (Section 24.3) (user ID number). It is possible, and sometimes convenient, to create accounts that are shared by groups of people. For example, in a transaction-processing application, all of the order-entry personnel might be assigned a common login name (as far as Unix is concerned, they only count as one user). In a research and development environment, certain administrative operations might be easier if members of a team shared the same account, in addition to having their own accounts. However, in most situations each person using the system has one and only one user ID, and vice versa.
Every user may be a member of one or more "groups." The user's entry in the master password file (/etc/passwd (Section 22.3)) defines his "primary group membership." The /etc/group (Section 49.6) file defines the groups that are available and can also assign other users to these groups as needed. For example, I am a member of three groups: staff, editors, and research. My primary group is staff; the group file says that I am also a member of the editors and research groups. We call editors and research my "secondary groups." The system administrator is responsible for maintaining the group and passwd files. You don't need to worry about them unless you're administering your own system.
Every file belongs to one user and one group. When a file is first created, its owner is the user who created it; its group is the user's primary group or the group of the directory in which it's created. For example, all files I create are owned by the user mikel and the group staff. As the file's owner, I am allowed to use the chgrp command to change the file's group. On filesystems that don't have quotas (Section 15.11), I can also use the chown command to change the file's owner. (To change ownership on systems with quotas, see Section 50.15.) For example, to change the file
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Superuser (Root)
In general, a process (Section 24.1) is a program that's running: a shell, the ls command, the vi editor, and so on. In order to kill a process (Section 24.12), change its priority (Section 26.5), or manipulate it in any other way, you have to be the process' owner (i.e., the user who started it). In order to delete a job from a print queue (Section 45.1), you must be the user who started it.
As you might guess, there needs to be a way to circumvent all of this security. Someone has to be able to kill runaway programs, modify the system's files, and so on. Under Unix, a special user known as root (and commonly called the "superuser") is allowed to do anything.
To become the superuser, you can either log in as root or use the su (Section 49.9) command. In this book, though, we'll assume that you don't have the superuser password. Almost all of what we describe can be done without becoming superuser.
— ML
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
When Is a File Not a File?
Unix differs from most operating systems in that it is file oriented. The designers of Unix decided that they could make the operating system much simpler if they treated everything as if it were a file. As far as Unix is concerned, disk drives, terminals, modems, network connections, etc. are all just files. Recent versions of Unix (such as Linux) have gone further: files can be pipes (FIFOs) (Section 43.11) and processes are files (Section 24.9). Like waves and particles in quantum physics, the boundary between files and the rest of the world can be extremely fine: whether you consider a disk a piece of hardware or a special kind of file depends primarily on your perspective and what you want to do with it.
Therefore, to understand Unix, you have to understand what files are. A file is nothing more than a stream of bytes — that is, an arbitrarily long string of bytes with no special structure. There are no special file structures and only a few special file types (for keeping track of disks and a few other purposes). The structure of any file is defined by the programs that use it, not by the Unix operating system. You may hear users talk about file headers and so on, but these are defined by the applications that use the files, not by the Unix filesystem itself.
Unix programs do abide by one convention, however. Text files use a single newline character (linefeed) between lines of text, rather than the carriage return-linefeed combination used in Microsoft Windows or the carriage returns used in the Macintosh. This difference may cause problems when you bring files from other operating systems over to Unix. Windows files will often be littered with carriage returns (Ctrl-M), which are necessary for that operating system but superfluous for Unix. These carriage returns will look ugly if you try to edit or print the file and may confuse some Unix programs. Mac text files will appear to be one long line with no breaks. Of course, you can use Unix utilities to convert Mac and Windows files for Unix.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Scripting
Scripting languages and scripting applications differ from compiled languages and applications in that the application is interpreted as run rather than compiled into a machine-understandable format. You can use shell scripting for many of your scripting needs, but there are times when you'll want to use something more sophisticated. Though not directly a part of a Unix system, most Unix installations come with the tools you need for this more complex scripting — Perl (Chapter 41), Python (Chapter 42), and Tcl.
These three scripting languages seem so prevelant within the Unix world that I think of them as the Unix Scripting language triumvirate.
Perl is probably the granddaddy of scripting. Created by Larry Wall, this language is probably used more than any other for creating complex scripts to perform sophisticated functionality with Unix and other operating systems. The language is particularly noted for its ability to handle regular expressions, as well as working with files and other forms of I/O.
Python isn't as widespread as Perl, but its popularity is growing. One reason it's gaining popularity is that as a language, Python is more structured and a little more verbose than Perl, and therefore a little easier to read. In addition, according to its fans, Python has more object-oriented and data-manipulation features than the file-manipulation and regular-expression manipulation of Perl.
Tcl is particularly prevalent within Linux systems, though its use is widespread throughout all Unix systems. It's popular because it's simpler to learn than Perl and allows scripters to get up to speed more quickly than you can with Perl or Python. In addition, the language also has access to a very popular graphical user interface library called the Tk toolkit. You'll rarely hear about Tcl without the associated Tk.
—TOR and SP
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Unix Networking and Communications
Generally speaking, a network lets two or more computers communicate and work together. Partly because of the open design of Unix, a lot of networking development has been done in this operating system. Just as there are different versions of Unix, there are different ways and programs to use networks from Unix.
There's an entire chapter devoted to Connectivity (Chapter 46), but for now, here's a quick review of the major networking components.
The Internet
The Internet is a worldwide network of computers. Internet users can transfer files, log into other computers, and use a wide range of programs and services.
WWW
The World Wide Web is a set of information servers on the Internet. The servers are linked into a hypertext web of documents, graphics, sound, and more. Point-and-click browser programs turn that hypertext into an easy-to-use Internet interface. (For many people, the Web is the Internet. But Unix lets you do much more.)
mail
A Unix facility that's been around for years, long before networking was common, is electronic mail. Users can send electronic memos, usually called email messages, between themselves. When you send email, your message waits for the other user to start his own mail program. System programs can send you mail to tell you about problems or give you information. You can send mail to programs, asking them for information. Worldwide mailing lists connect users into discussion groups.
ftp
The ftp program is one way to transfer files between your computer and another computer with TCP/IP, often over the Internet network, using the File Transfer Protocol (FTP).
UUCP
Unix-to-Unix Copy is a family of programs (uucp, uux, uulog, and others) for transferring files and email between computers. UUCP is usually used with modems over telephone lines and has been mostly superceded by Internet-type connections.
Usenet
Usenet isn't exactly a network. It's a collection of hundreds of thousands (millions?) of computers worldwide that exchange files called
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The X Window System
In 1988, an organization called the MIT (Massachusetts Institute of Technology) X Consortium was formed to promote and develop a vendor-neutral windowing system called the X Window System. (It was called "X" because it was a follow-on to a window system called "W" that was developed at Stanford University.) The organization eventually moved away from MIT and became known as the X Consortium. The XFree86 Project, Inc. is another major group developing X; they produce a freely redistributable version that's used on Linux and other Unix-like systems such as Darwin.
A window system is a way of dividing up the large screen of a workstation into multiple virtual terminals, or windows. Each window can interact with a separate application program — or a single application can have many windows. While the "big win" is to have applications with point-and-click mouse-driven user interfaces, one of the most common applications is still a simple terminal emulator (xterm (Section 5.9)). X thus allows a workstation to display multiple simultaneous terminal sessions — which makes many of the standard Unix multitasking features such as job control less important because programs can all be running in the foreground in separate windows. X also runs on many kinds of hardware, and it lets you run a program on a remote computer (across a network) while the program's windows are displayed on your local system. Because Unix systems also run on many kinds of hardware, this makes X a good match for Unix.
Unix boxes are, by default, characters-based systems. GUI systems are added to facilitate ease of use, as well as to provide access to a great number of sophisticated applications. The Mac OS X, though, is already a GUI, built on the BSD-based Unix environment, Darwin.
Though Darwin doesn't come with X Windows, you can download and install this, as well as X Windows-based GUIs, such as XDarwin (accessible at http://www.xdarwin.org) and OroborOSX (available at the Apple web site at http://www.apple.com).
—TOR and JP
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Getting Help
The Unix operating system was one of the first to include online documentation. It's not the best in the world — most users who haven't internalized the manual set curse it once a week — but it has proven surprisingly resilien