Spare Details of the masterindex Program
This section presents a few interesting details of the masterindex program that might otherwise escape attention. The purpose of this section is to extract some interesting program fragments and show how they solve a particular problem.
How to Hide a Special Character
Our first fragment is from the input.idx script, whose job it is to standardize the index entries before they are sorted. This program takes as its input a record consisting of two tab-separated fields: the index entry and its page number. A colon is used as part of the syntax for indicating the parts of an index entry.
Because the program uses a colon as a special character, we must provide a way to pass a literal colon through the program. To do this, we allow the indexer to specify two consecutive colons in the input. However, we can’t simply convert the sequence to a literal colon because the rest of the program modules called by masterindex read three colon-separated fields. The solution is to convert the colon to its octal value using the gsub( ) function.
#< from input.idx # convert literal colon to octal value $1 ~ /::/ { # substitute octal value for "::" gsub(/::/, "\\72", $1)
“\\72” represents the octal value of a colon. (You can find this value by scanning a table of hexadecimal and octal equivalents in the file /usr/pub/ascii.) In the last program module, we use gsub( ) to convert the octal value back to a colon. Here’s the code from format.idx.
#< from format.idx ...
Get sed & awk, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.