The Regex Compiler
After the variable interpolation pass has had its way with the string, the regex parser finally gets a shot at trying to understand your regular expression. There’s not actually a great deal that can go wrong at this point, apart from messing up the parentheses or using a sequence of metacharacters that doesn’t mean anything. The parser does a recursive-descent analysis of your regular expression and, if it parses, turns it into a form suitable for interpretation by the Engine (see the next section). Most of the interesting stuff that goes on in the parser involves optimizing your regular expression to run as fast as possible. We’re not going to explain that part. It’s a trade secret. (Rumors that looking at the regular expression code will drive you insane are greatly exaggerated. We hope.)
But you might like to know what the parser actually thought of
your regular expression, and if you ask it politely, it will tell you.
By saying use re "debug", you can
examine how the regex parser processes your pattern. (You can also see
the same information by using the –Dr
command-line switch, which is available to you if your Perl was compiled
with the –DDEBUGGING flag during
installation.)
#!/usr/bin/perl use re "debug"; "Smeagol" =~ /^Sm(.*)[aeiou]l$/;
The output is below. You can see that prior to execution Perl
compiles the regex and assigns meaning to the components of the pattern:
BOL for the beginning of line
(^), REG_ANY for the dot, and so on:
Compiling REx "^Sm(.*)[aeiou]l$" ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access