Understanding the Python regex engine

The re module uses a backtracking regular expression engine; although, in the very well-known book Mastering Regular Expressions by Jeffrey E. F. Friedl, it is classified as Nondeterministic Finite Automata (NFA) type. Also, according to Tim Peters (https://mail.python.org/pipermail/tutor/2006-January/044335.html), the module is not purely NFA.

These are the most common characteristics of the algorithm:

  • It supports "lazy quantifiers" such as *?, +?, and ??.
  • It matches the first coincidence, even though there are longer ones in the string.
    >>>re.search("engineer|engineering", "engineering").group()'engineer'

    This also means that order is important.

  • The algorithm tracks only one transition at one step, which means ...

Get Mastering Python Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.