Many people think of computers primarily as “number crunchers,” and think of word processors as generating form letters and boilerplate proposals. That computers can be used productively by writers, not just research scientists, accountants, and secretaries, is not so widely recognized. Today, writers not only work with words, they work with computers and the software programs, printers, and terminals that are part of a computer system.

The computer has not simply replaced a typewriter; it has become a system for integrating many other technologies. As these technologies are made available at a reasonable cost, writers may begin to find themselves in new roles as computer programmers, systems integrators, data base managers, graphic designers, typesetters, printers, and archivists.

The writer functioning in these new roles is faced with additional responsibilities. Obviously, it is one thing to have a tool available and another thing to use it skillfully. Like a craftsman, the writer must develop a number of specialized skills, gaining control over the method of production as well as the product. The writer must look for ways to improve the process by integrating new technologies and designing new tools in software.

In this book, we want to show how computers can be used effectively in the preparation of written documents, especially in the process of producing book-length documents. Surely it is important to learn the tools of the trade, and we will demonstrate the tools available in the UNIX environment. However, it is also valuable to examine text processing in terms of problems and solutions: the problems faced by a writer undertaking a large writing project and the solutions offered by using the resources and power of a computer system.

In Chapter 1, we begin by outlining the general capabilities of word-processing systems. We describe in brief the kinds of things that a computer must be able to do for a writer, regardless of whether that writer is working on a UNIX system or on an IBM PC with a word-processing package such as WordStar or MultiMate. Then, having defined basic word-processing capabilities, we look at how a text-processing system includes and extends these capabilities and benefits. Last, we introduce the set of text-processing tools in the UNIX environment. These tools, used individually or in combination, provide the basic framework for a text-processing system, one that can be custom-tailored to supply additional capabilities.

Chapter 2 gives a brief review of UNIX fundamentals. We assume you are already somewhat acquainted with UNIX, but we included this information to make sure that you are familiar with basic concepts that we will be relying on later in the book.

Chapter 3 introduces the vi editor, a basic tool for entering and editing text. Although many other editors and word-processing programs are available with UNIX, vi has the advantage that it works, without modification, on almost every UNIX system and with almost every type of terminal. If you learn vi , you can be confident that your text editing skills will be completely transferable when you sit down at someone else’s terminal or use someone else’s system.

Chapter 4 introduces the nroff and troff formatting programs. Because vi is a text editor, not a word-processing program, it does only rudimentary formatting of the text you enter. You can enter special formatting codes to specify how you want the document to look, then format the text using either nroff or troff. (The nroff formatter is used for formatting documents to the screen or to typewriter-like printers; troff uses much the same formatting language, but has additional constructs that allow it to produce more elaborate effects on typesetters and laser printers.)

In this chapter, we also describe the different types of output devices for printing your finished documents. With the wider availability of laser printers, you need to become familiar with many typesetting terms and concepts to get the most out of troff’ s capabilities.

The formatting markup language required by nroff and troff is quite complex, because it allows detailed control over the placement of every character on the page, as well as a large number of programming constructs that you can use to define custom formatting requests or macros. A number of macro packages have been developed to make the markup language easier to use. These macro packages define commonly used formatting requests for different types of documents, set up default values for page layout, and so on.

Although someone working with the macro packages does not need to know about the underlying requests in the formatting language used by nroff and troff, we believe that the reader wants to go beyond the basics. As a result, Chapter 4 introduces additional basic requests that the casual user might not need. However, your understanding of what is going on should be considerably enhanced.

There are two principal macro packages in use today, ms and mm (named for the command-line options to nroff and troff used to invoke them). Both macro packages were available with most UNIX systems; now, however, ms is chiefly available on UNIX systems derived from Berkeley 4.x BSD, and mm is chiefly available on UNIX systems derived from AT&T System V. If you are lucky enough to have both macro packages on your system, you can choose which one you want to learn. Otherwise, you should read either Chapter 5, The ms Macros, or Chapter 6, The mm Macros, depending on which version you have available.

Chapter 7 returns to vi to consider its more advanced features. In addition, it takes a look at how some of these features can support easy entry of formatting codes used by nroff and troff.

Tables and mathematical equations provide special formatting problems. The low-level nroff and troff commands for typesetting a complex table or equation are extraordinarily complex. However, no one needs to learn or type these commands, because two preprocessors, tbl and eqn, take a high-level specification of the table or equation and do the dirty work for you. They produce a “script” of nroff or troff commands that can be piped to the formatter to lay out the table or equations. The tbl and eqn preprocessors are described in Chapters 8 and 9, respectively.

More recent versions of UNIX (those that include AT&T’s separate Documenter’s Workbench software) also support a preprocessor called pic that makes it easier to create simple line drawings with troff and include them in your text. We talk about pic in Chapter 10.

Chapter 11 introduces a range of other UNIX text-processing tools—programs for sorting, comparing, and in various ways examining the contents of text files. This chapter includes a discussion of the standard UNIX spell program and the Writer’s Workbench programs style and diction.

This concludes the first part of the book, which covers the tools that the writer finds at hand in the UNIX environment. This material is not elementary. In places, it grows quite complex. However, we believe there is a fundamental difference between learning how to use an existing tool and developing skills that extend a tool’s capabilities to achieve your own goals.

That is the real beauty of the UNIX environment. Nearly all the tools it provides are extensible, either because they have built-in constructs for self-extension, like nroff and troff’s macro capability, or because of the wonderful programming powers of the UNIX command interpreter, the shell.

The second part of the book begins with Chapter 12, on editing scripts. There are several editors in UNIX that allow you to write and save what essentially amount to programs for manipulating text. The ex editor can be used from within vi to make global changes or complex edits. The next step is to use ex on its own; and after you do that, it is a small step to the even more powerful global editor sed. After you have mastered these tools, you can build a library of special-purpose editing scripts that vastly extend your power over the recalcitrant words you have put down on paper and now wish to change.

Chapter 13 discusses another program—awk—that extends the concept of a text editor even further than the programs discussed in Chapter 12. The auk program is really a database programming language that is appropriate for performing certain kinds of text-processing tasks. In particular, we use it in this book to process output from troff for indexing.

The next five chapters turn to the details of writing troff macros, and show how to customize the formatting language to simplify formatting tasks. We start in Chapter 14 by looking at the basic requests used to build macros, then go on in Chapter 15 to the requests for achieving various types of special effects. In Chapters 16 and 17, we’ll take a look at the basic structure of a macro package and focus on how to define the appearance of large documents such as manuals. We’ll show you how to define different styles of section headings, page headers, footers, and so on. We’ll also talk about how to generate an automatic table of contents and index-two tasks that take you beyond troff into the world of shell programming and various UNIX text-processing utilities.

To complete these tasks, we need to return to the UNIX shell in Chapter 18 and examine in more detail the ways that it allows you to incorporate the many tools provided by UNIX into an integrated text-processing environment.

Numerous appendices summarize information that is spread throughout the text, or that couldn’t be crammed into it.


Before we turn to the subject at hand, a few acknowledgements are in order. Though only two names appear on the cover of this book, it is in fact the work of many hands. In particular, Grace Todino wrote the chapters on tbl and eqn in their entirety, and the chapters on vi and ex are based on the O’Reilly & Associates’ Nutshell Handbook, Learning the Vi Editor, written by Linda Lamb. Other members of the O’Reilly & Associates staff—Linda Mui, Valerie Quercia, and Donna Woonteiler—helped tirelessly with copyediting, proofreading, illustrations, typesetting, and indexing.

Donna was new to our staff when she took on responsibility for the job of copyfitting—that final stage in page layout made especially arduous by the many figures and examples in this book. She and Linda especially spent many long hours getting this book ready for the printer. Linda had the special job of doing the final consistency check on examples, making sure that copyediting changes or typesetting errors had not compromized the accuracy of the examples.

Special thanks go to Steve Talbott of Masscomp, who first introduced us to the power of troff and who wrote the first version of the extended ms macros, format shell script, and indexing mechanism described in the second half of this book. Steve’s help and patience were invaluable during the long road to mastery of the UNIX text-processing environment.

We’d also like to thank Teri Zak, the acquisitions editor at Hayden Books, for her vision of the Hayden UNIX series, and this book’s place in it.

In the course of this book’s development, Hayden was acquired by Howard Sams, where Teri’s role was taken over by Jim Hill. Thanks also to the excellent production editors at Sams, Wendy Ford, Lou Keglovitz, and especially Susan Pink Bussiere, whose copyediting was outstanding.

Through it all, we have had the help of Steve Kochan and Pat Wood of Pipeline Associates, Enc., consulting editors to the Hayden UNIX Series. We are grateful for their thoughtful and thorough review of this book for technical accuracy. (We must, of course, make the usual disclaimer: any errors that remain are our own.)

Steve and Pat also provided the macros to typeset the book. Our working drafts were printed on an HP LaserJet printer, using ditroff and TextWare International’s tplus postprocessor. Final typeset output was prepared with Pipeline Associates’ devps, which was used to convert ditroff output to PostScript, which was used in turn to drive a Linotronic L100 typesetter.

Get UNIX° TEXT PROCESSING now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.