Advanced Language Tools
If you have a background in parsing theory, you may know
that neither regular expressions nor string splitting is powerful
enough to handle more complex language grammars. Roughly, regular
expressions don’t have the stack “memory” required by true grammars,
so they cannot support arbitrary nesting of language constructs
(nested if statements in a
programming language, for instance). From a theoretical perspective,
regular expressions are intended to handle just the first stage of
parsing—separating text into components, otherwise known as
lexical analysis. Language parsing requires
more.
In most applications, the Python language itself can replace
custom languages and parsers—user-entered code can be passed to Python
for evaluation with tools such as eval and exec. By augmenting the system with custom
modules, user code in this scenario has access to both the full Python
language and any application-specific extensions required. In a sense,
such systems embed Python in Python. Since this is a common
application of Python, we’ll revisit this approach later in this
chapter.
For some sophisticated language analysis tasks, though, a full-blown parser may still be required. Since Python is built for integrating C tools, we can write integrations to traditional parser generator systems such as yacc and bison, tools that create parsers from language grammar definitions. Better yet, we could use an integration that already exists—interfaces to such common parser generators ...