114
|
Python Pocket Reference
Pattern Syntax
Pattern strings are specified by concatenating forms (see
Table 19) as well as by character class escapes (see Table 20).
Python character escapes (e.g.,
\t for tab) can also appear.
Pattern strings are matched against text strings, yielding a
Boolean match result, as well as grouped substrings matched
by subpatterns in parentheses:
>>> import re
>>> patt = re.compile('hello[ \t]*(.*)')
>>> mobj = patt.match('hello world!')
>>> mobj.group(1)
'world!'
In Table 19, C is any character, R is any regular expression
form in the left column of the table, and m and n are inte-
gers. Each form usually consumes as much of the string
being matched as possible, except for the nongreedy forms
(which consume as little as possible, as long as the entire pat-
tern still matches the target string).
Table 19. Regular expression pattern syntax
Form Description
. Matches any character (including newline if DOTALL flag is
specified).
^ Matches start of string (of every line in MULTILINE mode).
$ Matches end of string (of every line in MULTILINE mode).
C Any nonspecial character matches itself.
R* Zero or more occurrences of preceding regular expression R (as
many as possible).
R+ One or more occurrences of preceding regular expression R (as
many as possible).
R? Zero or one occurrence of preceding regular expression R.
R{m,n} Matches from m to n repetitions of preceding regular expression
R.
The re Pattern-Matching Module
|
115
R*?, R+?,
R??, R{m,n}?
Same as *, +, and ?, but matches as few characters/times as
possible; nongreedy.
[...] Defines character set; e.g., [a-zA-Z] matches all letters (also
see Table 20).
[^...] Defines complemented character set: matches if character is not
in set.
\ Escapes special characters (e.g., *?+|( )) and introduces special
sequences (see Table 20). Due to Python rules, write as \\ or
r'\\'.
\\ Matches a literal \; due to Python string rules, write as \\\\ in
pattern, or r'\\'.
R|R Alternative: matches left or right R.
RR Concatenation: matches both Rs.
(R) Matches any RE inside ( ), and delimits a group (retains
matched substring).
(?: R) Same as (R) but doesn’t delimit a group.
(?= R) Look-ahead assertion: matches if R matches next, but doesn’t
consume any of the string (e.g., X (?=Y) matches X if followed
by Y).
(?! R) Negative look-ahead assertion: matches if R doesn’t match next.
Negative of (?=R).
(?P<name> R) Matches any RE inside ( ) and delimits a named group (e.g.,
r'(?P<id>[a-zA-Z_]\ w*)' defines a group named id).
(?P=name) Matches whatever text was matched by the earlier group named
name.
(?<= R) Positive look-behind assertion: matches if preceded by a match
of fixed-width R.
(?<! R) Negative look-behind assertion: matches if not preceded by a
match of fixed-width R.
(?#...) A comment; ignored.
(?letter) letter is one of i, L, m, s, x, or u. Set flag (re.I, re.L, etc.)
for entire RE.
Table 19. Regular expression pattern syntax (continued)
Form Description

Get Python Pocket Reference, Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.