Chapter 1. What Is a Regular Expression?
Regular expressions are specially encoded text strings used as patterns for matching sets of strings. They began to emerge in the 1940s as a way to describe regular languages, but they really began to show up in the programming world during the 1970s. The first place I could find them showing up was in the QED text editor written by Ken Thompson.
“A regular expression is a pattern which specifies a set of strings of characters; it is said to match certain strings.” —Ken Thompson
Regular expressions later became an important part of the tool suite that emerged from the Unix operating system—the ed, sed and vi (vim) editors, grep, AWK, among others. But the ways in which regular expressions were implemented were not always so regular.
Note
This book takes an inductive approach; in other words, it moves from the specific to the general. So rather than an example after a treatise, you will often get the example first and then a short treatise following that. It’s a learn-by-doing book.
Regular expressions have a reputation for being gnarly, but that all depends on how you approach them. There is a natural progression from something as simple as this:
\d
a character shorthand that matches any digit from 0 to 9, to something a bit more complicated, like:
^(\(\d{3}\)|^\d{3}[.-]?)?\d{3}[.-]?\d{4}$which is where we’ll wind up at the end of this chapter: a fairly robust regular expression that matches a 10-digit, North American telephone number, with or without ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access