Chapter 9Regular Expressions

Regular expressions are a tool for matching text by looking for a pattern (rather than looking for a text string) in an easy and straightforward manner. For example, you could check for the presence of an exact text string within another text string simply by using the Python in keyword, as shown here:

>>> haystack = 'My phone number is 213-867-5309.'
>>> '213-867-5309' in haystack
True

Sometimes, however, you do not have the exact text you want to match. For example, what if you want to know whether any valid phone number is present in a string? To take that one step further, what if you want to know whether any valid phone number is present in the string, and also want to know what that phone number is?

This is where regular expressions are useful. Their purpose is to specify a pattern of text to identify within a bigger text string. Regular expressions can identify the presence or absence of text matching the pattern, and also split a pattern into one or more subpatterns, delivering the specific text within each.

This chapter explores regular expressions (or regexes, for short). First, you learn how to perform regular expression searches in Python using the re module. You then explore various regular expressions, beginning with the simple and working toward the more complex. Finally, you learn about regular expression substitution.

Why Use Regular Expressions?

You use regular expressions for two common reasons.

The first reason is data mining—that ...

Get Professional Python now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.