Commenting Regular Expressions


You want to make your complex regular expressions understandable and maintainable.


You have four techniques at your disposal: comments outside the pattern, comments inside the pattern with the /x modifier, comments inside the replacement part of s///, and alternate delimiters.


The piece of sample code in Example 6.1 uses all four techniques. The initial comment describes the overall intent of the regular expression. For relatively simple patterns, this may be all that is needed. More complex patterns, as in the example, will require more documentation.

Example 6-1. resname

#!/usr/bin/perl -p
# resname - change all "" style names in the input stream
# into " []" (or whatever) instead

use Socket;                 # load inet_addr
s{                          #
    (                       # capture the hostname in $1
        (?:                 # these parens for grouping only
            (?! [-_]  )     # lookahead for neither underscore nor dash
            [\w-] +         # hostname component
            \.              # and the domain dot
        ) +                 # now repeat that whole thing a bunch of times
        [A-Za-z]            # next must be a letter
        [\w-] +             # now trailing domain part
    )                       # end of $1 capture
}{                          # replace with this:
    "$1 " .                 # the original bit, plus a space
           ( ($addr = gethostbyname($1))   # if we get an addr
            ? "[" . inet_ntoa($addr) . "]" #        format it
            : "[???]"                      # else mark dubious
}gex;               # /g for global
                    # /e for execute
                    # /x for nice formatting

For aesthetics, the example uses alternate delimiters. When you split your match or substitution over multiple lines, ...

Get Perl Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.