Mark Jason Dominus
This isn’t an article about how to use regexes; you’ve seen plenty of those already. It’s about how you would write a regex package from scratch, in a language like C that didn’t already have one. I demonstrate a new module, Regex.pm, which implements regexes from nothing, in Perl. This will give you an idea of how regex matching is possible, although the details differ rather substantially from what Perl actually does.
Here’s the basic strategy. We’ll see a kind of “machine” that reads some input, one character at a time, and then, depending on what’s in the input and on the various wheels and gears in the machine, says either “yes” or “no”. The machines are simple, and it turns out that if we have a regex, it’s not hard to construct a machine that says “yes” for exactly those strings that match the regex, and “no” for all the other strings.
When our program wants to see if
/R/, it’ll do something like this:
Construct the machine that corresponds to
S into the machine.
If the machine says “yes”, then
/R/. (Otherwise, it didn’t.)
Maybe this sounds bizarre, but it’s what Perl does. If you can follow what happens in this article, you’ll know what Perl is really up to when it performs a regex match.
We’re on a tight budget here, so our machines will be made of circles and arrows instead of wheels and gears. This diagram shows a machine.
Let’s see if this machine says “yes” to the string ...