An Acronym Processor
Now let’s look at a program that scans a file for acronyms. Each acronym is replaced with a full text description, and the acronym in parentheses. If a line refers to “BASIC,” we’d like to replace it with the description “Beginner’s All-Purpose Symbolic Instruction Code” and put the acronym in parentheses afterwards. (This is probably not a useful program in and of itself, but the techniques used in the program are general and have many such uses.)
We can design this program for use as a filter that prints all lines, regardless of whether a change has been made. We’ll call it awkro.
awk '# awkro - expand acronyms # load acronyms file into array "acro" FILENAME == "acronyms" { split($0, entry, "\t") acro[entry[1]] = entry[2] next } # process any input line containing caps /[A-Z][A-Z]+/ { # see if any field is an acronym for (i = 1; i <= NF; i++) if ( $i in acro ) { # if it matches, add description $i = acro[$i] " (" $i ")" } } { # print all lines print $0 }' acronyms $*
Let’s first see it in action. Here’s a sample input file.
$ cat sample
The USGCRP is a comprehensive
research effort that includes applied
as well as basic research.
The NASA program Mission to Planet Earth
represents the principal space-based component
of the USGCRP and includes new initiatives
such as EOS and Earthprobes.
And here is the file acronyms:
$ cat acronyms
USGCRP U.S. Global Change Research Program
NASA National Aeronautic and Space Administration
EOS Earth Observing System
Now we run ...
Get sed & awk, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.