Regular Expression Matching

Splitting and joining strings is a simple way to process text, as long as it follows the format you expect. For more general text analysis tasks, Python provides regular expression matching utilities. Regular expressions (REs) are simply strings that define patterns to be matched against other strings. You supply a pattern and a string, and ask if the string matches your pattern. After a match, parts of the string matched by parts of the pattern are made available to your script. That is, matches not only give a yes/no answer, but they can pick out substrings as well.

Regular expression pattern strings can be complicated (let’s be honest -- they can be downright gross to look at). But once you get the hang of them, they can replace larger hand-coded string search routines. In Python, regular expressions are not part of the syntax of the Python language itself, but are supported by extension modules that you must import to use. The modules define functions for compiling pattern strings into pattern objects, matching these objects against strings, and fetching matched substrings after a match.

Beyond those generalities, Python’s regular expression story is complicated a little by history:

The regex module (old)

In earlier Python releases, a module called regex was the standard (and only) RE module. It was fast and supported patterns coded in awg, grep, and emacs style, but it is now somewhat deprecated (though it will likely still be available for some ...

Get Programming Python, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.