Availability of sed and awk

Sed and awk were part of Version 7 UNIX (also known as “V7,” and “Seventh Edition”) and have been part of the standard distribution ever since. Sed has been unchanged since it was introduced.

The Free Software Foundation GNU project’s version of sed is freely available, although not technically in the public domain. Source code for GNU sed is available via anonymous FTP[1] to the host ftp.gnu.ai.mit.edu. It is in the file ftp://ftp.gnu.ai.mit.edu/pub/gnu/sed-2.05.tar.gz. This is a tar file compressed with the gzip program, whose source code is available in the same directory. There are many sites world-wide that “mirror” the files from the main GNU distribution site; if you know of one close to you, you should get the files from there. Be sure to use “binary” or “image” mode to transfer the file(s).

In 1985, the authors of awk extended the language, adding many useful features. Unfortunately, this new version remained inside AT&T for several years. It became part of UNIX System V as of Release 3.1. It can be found under the name of nawk, for new awk; the older version still exists under its original name. This is still the case on System V Release 4 systems.

On commercial UNIX systems, such as those from Hewlett-Packard, Sun, IBM, Digital, and others, the naming situation is more complicated. All of these systems have some version of both old and new awk, but what each vendor names each program varies. Some have oawk and awk, others have awk and nawk. The best advice we can give is to check your local documentation.[2] Throughout this book, we use the term awk to describe POSIX awk. Specific implementations will be referred to by name, such as “gawk,” or “the Bell Labs awk.”

Chapter 11 discusses three freely available awks (including where to get them), as well as several commercial ones.

Note

Since the first edition of this book, the awk language was standardized as part of the POSIX Command Language and Utilities Standard (P1003.2). All modern awk implementations aim to be upwardly compatible with the POSIX standard.

The standard incorporates features that originated in both new awk and gawk. In this book, you can assume that what is true for one implementation of POSIX awk is true for another, unless a particular version is designated.

DOS Versions

Gawk, mawk, and GNU sed have been ported to DOS. There are files on the main GNU distribution site with pointers to DOS versions of these programs. In addition, gawk has been ported to OS/2, VMS, and Atari and Amiga microcomputers, with ports to other systems (Macintosh, Windows) in progress.

egrep, sed, and awk are available for MS-DOS-based machines as part of the MKS Toolkit (Mortice Kern Systems, Inc., Ontario, Canada). Their implementation of awk supports the features of POSIX awk.

The MKS Toolkit also includes the Korn shell, which means that many shell scripts written for the Bourne shell on UNIX systems can be run on a PC. While most users of the MKS Toolkit have probably already discovered these tools in UNIX, we hope that the benefits of these programs will be obvious to PC users who have not ventured into UNIX.

Thompson Automation Software[3] has an awk compiler for UNIX, DOS, and Microsoft Windows. This version is interesting because it has a number of extensions to the language, and it includes an awk debugger, written in awk!

We have used a PC on occasion because Ventura Publisher is a terrific formatting package. One of the reasons we like it is that we can continue to use vi to create and edit the text files and use sed for writing editing scripts. We have used sed to write conversion programs that translate troff macros into Ventura stylesheet tags. We have also used it to insert tags in batch mode. This can save having to manually tag repeated elements in a file.

Sed and awk are also useful for writing conversion programs that handle different file formats.

Other Sources of Information About sed and awk

For a long time, the main source of information on these utilities was two articles contained in Volume 2 of the UNIX Programmer’s Guide. The article awk—A Pattern Scanning and Processing Language (September 1, 1978) was written by the language’s three authors. In 10 pages, it offers a brief tutorial and discusses several design and implementation issues. The article SED—A Non-Interactive Text Editor (August 15, 1978) was written by Lee E. McMahon. It is a reference that gives a full description of each function and includes some useful examples (using Coleridge’s Xanadu as sample input).

In trade books, the most significant treatment of sed and awk appears in The UNIX Programming Environment by Brian W. Kernighan and Rob Pike (Prentice-Hall, 1984). The chapter entitled “Filters” not only explains how these programs work but shows how they can work together to build useful applications.

The authors of awk collaborated on a book describing the enhanced version: The AWK Programming Language (Addison-Wesley, 1988). It contains many full examples and demonstrates the broad range of areas where awk can be applied. It follows in the style of the UNIX Programming Environment, which at times makes it too dense for some readers who are new users. The source code for the example programs in the book can be found in the directory ftp://netlib.bell-labs.com/netlib/research/awkbookcode on netlib.bell-labs.com.

The IEEE Standard for Information and Technology Portable Operating System Interface (POSIX) Part 2: Shell and Utilities (Standard 1003.2-1992)[4] describes both sed and awk.[5] It is the “official” word on the features available for portable shell programs that use sed and awk. Since awk is a programming language in its own right, it is also the official word on portable awk programs.

In 1996, the Free Software Foundation published The GNU Awk User’s Guide, by Arnold Robbins. This is the documentation for gawk, written in a more tutorial style than the Aho, Kernighan, and Weinberger book. It has two full chapters of examples, and covers POSIX awk. This book is also published by SSC under the title Effective AWK Programming, and the Texinfo source for the book comes with the gawk distribution.

It is one of the current deficiencies of GNU sed that it has no documentation of its own, not even a manpage.

Most general introductions to UNIX introduce sed and awk in a long parade of utilities. Of these books, Henry McGilton and Rachel Morgan’s Introducing the UNIX System offers the best treatment of basic editing skills, including use of all UNIX text editors.

UNIX Text Processing (Hayden Books, 1987), by the original author of this handbook and Tim O’Reilly, covers sed and awk in full, although we did not include the new version of awk. Readers of that book will find some parts duplicated in this book, but in general a different approach has been taken here. Whereas in the textbook we treat sed and awk separately, expecting only advanced users to tackle awk, here we try to present both programs in relation to one another. They are different tools that can be used individually or together to provide interesting opportunities for text processing.

Finally, in 1995 the Usenet newsgroup comp.lang.awk came into being. If you can’t find what you need to know in one of the above books, you can post a question in the newsgroup, with a good chance that someone will be able to help you.

The newsgroup also has a “frequently asked questions” (FAQ) article that is posted regularly. Besides answering questions about awk, the FAQ lists many sites where you can obtain binaries of different versions of awk for different systems. You can retrieve the FAQ via FTP in the file called ftp://rtfm.mit.edu/pub/usenet/comp.lang.awk/faq from the host rtfm.mit.edu.

Sample Programs

The sample programs in this book were originally written and tested on a Mac IIci running A/UX 2.0 (UNIX System V Release 2) and a SparcStation 1 running SunOS 4.0. Programs requiring POSIX awk were re-tested using gawk 3.0.0 as well as the August 1994 version of the Bell Labs awk from the Bell Labs FTP site (see Chapter 11 for the FTP details). Sed programs were retested with the SunOS 4.1.3 sed and GNU sed 2.05.



[1] If you don’t have Internet access and wish to get a copy of GNU sed, contact the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 U.S.A. The telephone number is 1-617-542-5942, and the fax number is 1-617-542-2652.

[2] Purists refer to the new awk simply as awk; the new one was intended to replace the original one. Alas, almost 10 years after it was released, this still has not really happened.

[3] 5616 SW Jefferson, Portland, OR 97221 U.S.A., 1-800-944-0139 within the U.S., 1-503-224-1639 elsewhere.

[4] Whew! Say that three times fast!

[5] The standard is not available online. It can be ordered from the IEEE by calling 1-800-678-IEEE(4333) in the U.S. and Canada, 1-908-981-0060 elsewhere. Or, see http://www.ieee.org/ from a Web browser. The cost is U.S. $228, which includes Standard 1003.2d-1994—Amendment 1 for Batch Environments. Members of IEEE and/or IEEE societies receive a discount.

Get sed & awk, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.