Chapter 1. Background
This chapter provides a brief history of the development of the Unix system. Understanding where and how Unix developed and the intent behind its design will help you use the tools better. The chapter also introduces the guiding principles of the Software Tools philosophy, which are then demonstrated throughout the rest of the book.
It is likely that you know something about the development of Unix, and many resources are available that provide the full story. Our intent here is to show how the environment that gave birth to Unix influenced the design of the various tools.
Unix was originally developed in the Computing Sciences Research Center at Bell Telephone Laboratories. The first version was developed in 1970, shortly after Bell Labs withdrew from the Multics project. Many of the ideas that Unix popularized were initially pioneered within the Multics operating system; most notably the concepts of devices as files, and of having a command interpreter (or shell ) that was intentionally not integrated into the operating system. A well-written history may be found at http://www.bell-labs.com/history/unix.
Because Unix was developed within a research-oriented environment, there was no commercial pressure to produce or ship a finished product. This had several advantages:
The system was developed by its users. They used it to solve real day-to-day computing problems.
The researchers were free to experiment and to change programs as needed. Because the user base was small, if a program needed to be rewritten from scratch, that generally wasn’t a problem. And because the users were the developers, they were free to fix problems as they were discovered and add enhancements as the need for them arose.
Unix itself went through multiple research versions, informally referred to with the letter “V” and a number: V6, V7, and so on. (The formal name followed the edition number of the published manual: First Edition, Second Edition, and so on. The correspondence between the names is direct: V6 = Sixth Edition, and V7 = Seventh Edition. Like most experienced Unix programmers, we use both nomenclatures.) The most influential Unix system was the Seventh Edition, released in 1979, although earlier ones had been available to educational institutions for several years. In particular, the Seventh Edition system introduced both awk and the Bourne shell, on which the POSIX shell is based. It was also at this time that the first published books about Unix started to appear.
The researchers at Bell Labs were all highly educated computer scientists. They designed the system for their personal use and the use of their colleagues, who also were computer scientists. This led to a “no nonsense” design approach; programs did what you told them to do, without being chatty and asking lots of “are you sure?” questions.
Besides just extending the state of the art, there existed a quest for elegance in design and problem solving. A lovely definition for elegance is “power cloaked in simplicity.” The freedom of the Bell Labs environment led to an elegant system, not just a functional one.
Of course, the same freedom had a few disadvantages that became clear as Unix spread beyond its development environment:
There were many inconsistencies among the utilities. For example, programs would use the same option letter to mean different things, or use different letters for the same task. Also, the regular-expression syntaxes used by different programs were similar, but not identical, leading to confusion that might otherwise have been avoided. (Had their ultimate importance been recognized, regular expression-matching facilities could have been encoded in a standard library.)
Many utilities had limitations, such as on the length of input lines, or on the number of open files, etc. (Modern systems generally have corrected these deficiencies.)
Sometimes programs weren’t as thoroughly tested as they should have been, making it possible to accidentally kill them. This led to surprising and confusing “core dumps.” Thankfully, modern Unix systems rarely suffer from this.
The system’s documentation, while generally complete, was often terse and minimalistic. This made the system more difficult to learn than was really desirable.
Most of what we present in this book centers around processing and manipulation of textual, not binary, data. This stems from the strong interest in text processing that existed during Unix’s early growth, but is valuable for other reasons as well (which we discuss shortly). In fact, the first production use of a Unix system was doing text processing and formatting in the Bell Labs Patent Department.
The original Unix machines (Digital Equipment Corporation PDP-11s) weren’t capable of running large programs. To accomplish a complex task, you had to break it down into smaller tasks and have a separate program for each smaller task. Certain common tasks (extracting fields from lines, making substitutions in text, etc.) were common to many larger projects, so they became standard tools. This was eventually recognized as being a good thing in its own right: the lack of a large address space led to smaller, simpler, more focused programs.
Many people were working semi-independently on Unix,
reimplementing each other’s programs. Between version differences and no
need to standardize, a lot of the common tools diverged. For example,
grep on one system used
-i to mean “ignore case when searching,” and it used
-y on another variant to mean the same thing! This sort
of thing happened with multiple utilities, not just a few. The common
small utilities were named the same, but shell programs written for the
utilities in one version of Unix probably wouldn’t run unchanged on
Eventually the need for a common set of standardized tools and options became clear. The POSIX standards were the result. The current standard, IEEE Std. 1003.1-2004, encompasses both the C library level, and the shell language and system utilities and their options.
The good news is that the standardization effort paid off. Modern commercial Unix systems, as well as freely available workalikes such as GNU/Linux and BSD-derived systems, are all POSIX-compliant. This makes learning Unix easier, and makes it possible to write portable shell scripts. (However, do take note of Chapter 14.)
Interestingly enough, POSIX wasn’t the only Unix standardization effort. In particular, an initially European group of computer manufacturers, named X/Open, produced its own set of standards. The most popular was XPG4 (X/Open Portability Guide, Fourth Edition), which first appeared in 1988. There was also an XPG5, more widely known as the UNIX 98 standard, or as the "Single UNIX Specification.” XPG5 largely included POSIX as a subset, and was also quite influential.
The XPG standards were perhaps less rigorous in their language, but covered a broader base, formally documenting a wider range of existing practice among Unix systems. (The goal for POSIX was to make a standard formal enough to be used as a guide to implementation from scratch, even on non-Unix platforms. As a result, many features common on Unix systems were initially excluded from the POSIX standards.) The 2001 POSIX standard does double duty as XPG6 by including the X/Open System Interface Extension (or XSI, for short). This is a formal extension to the base POSIX standard, which documents attributes that make a system not only POSIX-compliant, but also XSI-compliant. Thus, there is now only one formal standards document that implementors and application writers need refer to. (Not surprisingly, this is called the Single Unix Standard.)
Throughout this book, we focus on the shell language and Unix utilities as defined by the POSIX standard. Where it’s important, we’ll include features that are XSI-specific as well, since it is likely that you’ll be able to use them too.
Software Tools Principles
Over the course of time, a set of core principles developed for designing and writing software tools. You will see these exemplified in the programs used for problem solving throughout this book. Good software tools should do the following things:
- Do one thing well
In many ways, this is the single most important principle to apply. Programs that do only one thing are easier to design, easier to write, easier to debug, and easier to maintain and document. For example, a program like grep that searches files for lines matching a pattern should not also be expected to perform arithmetic.
A natural consequence of this principle is a proliferation of smaller, specialized programs, much as a professional carpenter has a large number of specialized tools in his toolbox.
- Process lines of text, not binary
Lines of text are the universal format in Unix. Datafiles containing text lines are easy to process when writing your own tools, they are easy to edit with any available text editor, and they are portable across networks and multiple machine architectures. Using text files facilitates combining any custom tools with existing Unix programs.
- Use regular expressions
Furthermore, although regular expressions varied across tools and Unix versions over the years, the POSIX standard provides only two kinds of regular expressions, with standardized library routines for regular-expression matching. This makes it possible for you to write your own tools that work with regular expressions identical to those of grep (called Basic Regular Expressions or BREs by POSIX), or identical to those of egrep (called Extended Regular Expressions or EREs by POSIX).
- Default to standard I/O
When not given any explicit filenames upon which to operate, a program should default to reading data from its standard input and writing data to its standard output. Error messages should always go to standard error. (These are discussed in Chapter 2.) Writing programs this way makes it easy to use them as data filters—i.e., as components in larger, more complicated pipelines or scripts.
- Don’t be chatty
Software tools should not be “chatty.” No
almost done, or
finished processingkinds of messages should be mixed in with the regular output of a program (or at least, not by default).
When you consider that tools can be strung together in a pipeline, this makes sense:
tool_1 < datafile | tool_2 | tool_3 | tool_4 > resultfile
If each tool produces “yes I’m working” kinds of messages and sends them down the pipe, the data being manipulated would be hopelessly corrupted. Furthermore, even if each tool sends its messages to standard error, the screen would be full of useless progress messages. When it comes to tools, no news is good news.
This principle has a further implication. In general, Unix tools follow a “you asked for it, you got it” design philosophy. They don’t ask “are you sure?” kinds of questions. When a user types
rm somefile, the Unix designers figured that he knows what he’s doing, and rm removes the file, no questions asked.
- Generate the same output format accepted as input
Specialized tools that expect input to obey a certain format, such as header lines followed by data lines, or lines with certain field separators, and so on, should produce output following the same rules as the input. This makes it easy to process the results of one program run through a different program run, perhaps with different options.
For example, the netpbm suite of programs manipulate image files stored in a Portable BitMap format. These files contain bitmapped images, described using a well-defined format. Each tool reads PBM files, manipulates the contained image in some fashion, and then writes a PBM format file back out. This makes it easy to construct a simple pipeline to perform complicated image processing, such as scaling an image, then rotating it, and then decreasing the color depth.
- Let someone else do the hard part
Often, while there may not be a Unix program that does exactly what you need, it is possible to use existing tools to do 90 percent of the job. You can then, if necessary, write a small, specialized program to finish the task. Doing things this way can save a large amount of work when compared to solving each problem fresh from scratch, each time.
- Detour to build specialized tools
As just described, when there just isn’t an existing program that does what you need, take the time to build a tool to suit your purposes. However, before diving in to code up a quick program that does exactly your specific task, stop and think for a minute. Is the task one that other people are going to need done? Is it possible that your specialized task is a specific case of a more general problem that doesn’t have a tool to solve it? If so, think about the general problem, and write a program aimed at solving that. Of course, when you do so, design and write your program so it follows the previous rules! By doing this, you graduate from being a tool user to being a toolsmith, someone who creates tools for others!
Unix was originally developed at Bell Labs by and for computer scientists. The lack of commercial pressure, combined with the small capacity of the PDP-11 minicomputer, led to a quest for small, elegant programs. The same lack of commercial pressure, though, led to a system that wasn’t always consistent, nor easy to learn.
As Unix spread and variant versions developed (notably the System V and BSD variants), portability at the shell script level became difficult. Fortunately, the POSIX standardization effort has borne fruit, and just about all commercial Unix systems and free Unix workalikes are POSIX-compliant.
The Software Tools principles as we’ve outlined them provide the guidelines for the development and use of the Unix toolset. Thinking with the Software Tools mindset will help you write clear shell programs that make correct use of the Unix tools.
 The name has changed at least once since then. We use the informal name “Bell Labs” from now on.
 I first heard this definition from Dan Forsyth sometime in the 1980s.
 The manual had two components: the reference manual and the user’s manual. The latter consisted of tutorial papers on major parts of the system. While it was possible to learn Unix by reading all the documentation, and many people (including the authors) did exactly that, today’s systems no longer come with printed documentation of this nature.
 For those who are really worried, the
-i option to rm forces rm to prompt for confirmation, and
in any case rm prompts for
confirmation when asked to remove suspicious files, such as
those whose permissions disallow writing. As always, there’s a
balance to be struck between the extremes of never prompting
and always prompting.
 The programs are not a standard part of the Unix toolset, but are commonly installed on GNU/Linux and BSD systems. The WWW starting point is http://netpbm.sourceforge.net/. From there, follow the links to the Sourceforge project page, which in turn has links for downloading the source code.
 There are three different formats; see the pnm(5) manpage if netpbm is installed on your system.