Spare Details of the masterindex Program

This section presents a few interesting details of the masterindex program that might otherwise escape attention. The purpose of this section is to extract some interesting program fragments and show how they solve a particular problem.

How to Hide a Special Character

Our first fragment is from the input.idx script, whose job it is to standardize the index entries before they are sorted. This program takes as its input a record consisting of two tab-separated fields: the index entry and its page number. A colon is used as part of the syntax for indicating the parts of an index entry.

Because the program uses a colon as a special character, we must provide a way to pass a literal colon through the program. To do this, we allow the indexer to specify two consecutive colons in the input. However, we can’t simply convert the sequence to a literal colon because the rest of the program modules called by masterindex read three colon-separated fields. The solution is to convert the colon to its octal value using the gsub( ) function.

#< from input.idx
# convert literal colon to octal value
$1 ~ /::/ {
        # substitute octal value for "::"
        gsub(/::/, "\\72", $1)

“\\72” represents the octal value of a colon. (You can find this value by scanning a table of hexadecimal and octal equivalents in the file /usr/pub/ascii.) In the last program module, we use gsub( ) to convert the octal value back to a colon. Here’s the code from format.idx.

#< from format.idx ...

Get sed & awk, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.