BUY THIS BOOK
Add to Cart

Print Book $9.95


Add to Cart

Print+PDF $12.93

Add to Cart

PDF $7.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £6.95

What is this?

Looking to Reprint or License this content?


sed and awk Pocket Reference
sed and awk Pocket Reference, Second Edition By Arnold Robbins
June 2002
Pages: 52

Cover | Table of Contents


Table of Contents

Chapter 1: sed & awk Pocket Reference
This pocket reference is a companion volume to O'Reilly's sed & awk, Second Edition, by Dale Dougherty and Arnold Robbins, and to Effective awk Programming, Third Edition, by Arnold Robbins. It presents a concise summary of regular expressions and pattern matching, and summaries of sed, awk, and gawk (GNU awk).
This pocket reference follows certain typographic conventions, outlined here:
Constant Width
Used for code examples, commands, directory names, and options.
Constant Width Italic
Used in syntax and command summaries to show replaceable text; this text should be replaced with user-supplied values.
Constant Width Bold
Used in code examples to show commands or other text that should be typed literally by the user.
Italic
Used to show generic arguments and options; these should be replaced with user-supplied values. Italic is also used to highlight comments in examples, to introduce new terms, and to indicate filenames.
$
Used in some examples as the Bourne shell or Korn shell prompt.
[ ]
Surround optional elements in a description of syntax. (The brackets themselves should never be typed.)
A number of Unix text-processing utilities let you search for, and in some cases change, text patterns rather than fixed strings. These utilities include the editing programs
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introduction
This pocket reference is a companion volume to O'Reilly's sed & awk, Second Edition, by Dale Dougherty and Arnold Robbins, and to Effective awk Programming, Third Edition, by Arnold Robbins. It presents a concise summary of regular expressions and pattern matching, and summaries of sed, awk, and gawk (GNU awk).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Conventions Used in This Book
This pocket reference follows certain typographic conventions, outlined here:
Constant Width
Used for code examples, commands, directory names, and options.
Constant Width Italic
Used in syntax and command summaries to show replaceable text; this text should be replaced with user-supplied values.
Constant Width Bold
Used in code examples to show commands or other text that should be typed literally by the user.
Italic
Used to show generic arguments and options; these should be replaced with user-supplied values. Italic is also used to highlight comments in examples, to introduce new terms, and to indicate filenames.
$
Used in some examples as the Bourne shell or Korn shell prompt.
[ ]
Surround optional elements in a description of syntax. (The brackets themselves should never be typed.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Matching Text
A number of Unix text-processing utilities let you search for, and in some cases change, text patterns rather than fixed strings. These utilities include the editing programs ed, ex, vi, and sed, the awk programming language, and the commands grep and egrep. Text patterns (formally called regular expressions) contain normal characters mixed with special characters (called metacharacters).
Metacharacters used in pattern matching are different from metacharacters used for filename expansion. When you issue a command on the command line, special characters are seen first by the shell, then by the program; therefore, unquoted metacharacters are interpreted by the shell for filename expansion. For example, the command:
$ grep [A-Z]* chap[12]
            
could be transformed by the shell into:
$ grep Array.c Bug.c Comp.c chap1 chap2
            
and would then try to find the pattern Array.c in files Bug.c, Comp.c, chap1, and chap2. To bypass the shell and pass the special characters to grep, use quotes as follows:
$ grep "[A-Z]*" chap[12]
            
Double quotes suffice in most cases, but single quotes are the safest bet.
Note also that in pattern matching, ? matches zero or one instance of a regular expression; in filename expansion, ? matches a single character.
Different metacharacters have different meanings, depending upon where they are used. In particular, regular expressions used for searching through text (matching) have one set of metacharacters, while the metacharacters used when processing replacement text have a different set. These sets also vary somewhat per program. This section covers the metacharacters used for searching and replacing, with descriptions of the variants in the different utilities.

Section 1.3.2.1: Search patterns

The characters in the following table have special meaning only in search patterns:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The sed Editor
The stream editor, sed, is a noninteractive editor. It interprets a script and performs the actions in the script. sed is stream-oriented because, like many Unix programs, input flows through the program and is directed to standard output. For example, sort is stream-oriented; vi is not. sed's input typically comes from a file or pipe, but it can also be directed from the keyboard. Output goes to the screen by default but can be captured in a file or sent through a pipe instead.
Typical uses of sed include:
  • Editing one or more files automatically
  • Simplifying repetitive edits to multiple files
  • Writing conversion programs
sed operates as follows:
  • Each line of input is copied into a pattern space, an internal buffer where editing operations are performed.
  • All editing commands in a sed script are applied, in order, to each line of input.
  • Editing commands are applied to all lines (globally) unless line addressing restricts the lines affected.
  • If a command changes the input, subsequent commands and address tests will be applied to the current line in the pattern space, not the original input line.
  • The original input file is unchanged because the editing commands modify a copy of each original input line. The copy is sent to standard output (but can be redirected to a file).
  • sed also maintains the hold space, a separate buffer that can be used to save data for later retrieval.
The syntax for invoking sed has two forms:
sed [-n] [-e] 'command' file(s)

sed [-n]  -f  scriptfile file(s)
            
The first form allows you to specify an editing command on the command line, surrounded by single quotes. The second form allows you to specify a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The awk Programming Language
awk is a pattern-matching program for processing files, especially when they are databases. The new version of awk, called nawk, provides additional capabilities. (It really isn't so new. The additional features were added in 1984, and it was first shipped with System V Release 3.1 in 1987. Nevertheless, the name was never changed on many systems.) Every modern Unix system comes with a version of new awk, and its use is recommended over old awk. The GNU version of awk, called gawk, implements new awk and provides a number of additional features.
Different systems vary in what new and old awk are called. Some have oawk and awk, for the old and new versions, respectively. Others have awk and nawk. Still others only have awk, which is the new version. This example shows what happens if your awk is the old one:
$ awk 1 /dev/null

awk: syntax error near line 1
awk: bailing out near line 1
awk will exit silently if it is the new version.
Items described here as "common extensions" are often available in different versions of new awk, as well as in gawk, but should not be used if strict portability of your programs is important to you.
The freely available versions of awk described in Section 1.6 all implement new awk. Thus, references in the following text such as "nawk only," apply to all versions. gawk has additional features.
With original awk, you can:
  • Think of a text file as made up of records and fields in a textual database
  • Perform arithmetic and string operations
  • Use programming constructs such as loops and conditionals
  • Produce formatted reports
With nawk, you can also:
  • Define your own functions
  • Execute Unix commands from a script
  • Process the results of Unix commands
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Additional Resources
This section lists resources for further exploration.
This following URLs indicate where to get source code for GNU sed, four freely available versions of awk, and GNU gettext.
ftp://ftp.gnu.org/gnu/sed/sed-3.02.tar.gz
The Free Software Foundation's version of sed. The somewhat older version, 2.05, is also available.
http://cm.bell-labs.com/~bwk
Brian Kernighan's home page, with links to the source code for the latest version of awk from Bell Laboratories.
ftp://ftp.freefriends.org/arnold/Awkstuff/mawk1.3.3.tar.gz
Michael Brennan's mawk. A very fast, very robust version of awk.
ftp://ftp.gnu.org/gnu/gawk/gawk-3.1.1.tar.gz
The Free Software Foundation's version of awk, called gawk.
http://awka.sourceforge.net
The home page for awka, a translator that turns awk programs into C, compiles the generated C, and then links the object code with a library that performs the core awk functions.
ftp://ftp.gnu.org/gnu/gettext/gettext-0.11.2.tar.gz
The source code for GNU gettext. Get this if you need to produce translations for your awk programs that use gawk.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Return to sed and awk Pocket Reference