Chapter 4. Pattern Matching with Regular Expressions
Introduction
Suppose you have been on the Internet for a few years and have been very faithful about saving all your correspondence, just in case you (or your lawyers, or the prosecution) need a copy. The result is that you have a 5 GB disk partition dedicated to saved mail. And let’s further suppose that you remember that somewhere in there is an email message from someone named Angie or Anjie. Or was it Angy? But you don’t remember what you called it or where you stored it. Obviously, you have to look for it.
But while some of you go and try to open up all 15,000,000 documents in a word processor, I’ll just find it with one simple command. Any system that provides regular expression support allows me to search for the pattern in several ways. The simplest to understand is:
Angie|Anjie|Angy
which you can probably guess means just to search for any of the variations. A more concise form (“more thinking, less typing”) is:
An[^ dn]
The syntax will become clear as we go through this chapter. Briefly, the “A” and the “n” match themselves, in effect finding words that begin with “An”, while the cryptic [^ dn] requires the “An” to be followed by a character other than (^ means not in this context) a space (to eliminate the very common English word “an” at the start of a sentence) or “d” (to eliminate the common word “and”) or “n” (to eliminate Anne, Announcing, etc.). Has your word processor gotten past its splash screen yet? Well, it doesn’t ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access