Context-Free Grammar

A Simple Grammar

Let’s start off by looking at a simple context-free grammar (CFG). By convention, the lefthand side of the first production is the start-symbol of the grammar, typically S, and all well-formed trees must have this symbol as their root label. In NLTK, context-free grammars are defined in the nltk.grammar module. In Example 8-9 we define a grammar and show how to parse a simple sentence admitted by the grammar.

Example 8-9. A simple context-free grammar.

grammar1 = nltk.parse_cfg("""
  S -> NP VP
  VP -> V NP | V NP PP
  PP -> P NP
  V -> "saw" | "ate" | "walked"
  NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "man" | "dog" | "cat" | "telescope" | "park"
  P -> "in" | "on" | "by" | "with"
  """)
>>> sent = "Mary saw Bob".split()
>>> rd_parser = nltk.RecursiveDescentParser(grammar1)
>>> for tree in rd_parser.nbest_parse(sent):
...      print tree
(S (NP Mary) (VP (V saw) (NP Bob)))

The grammar in Example 8-9 contains productions involving various syntactic categories, as laid out in Table 8-1. The recursive descent parser used here can also be inspected via a graphical interface, as illustrated in Figure 8-3; we discuss this parser in more detail in Parsing with Context-Free Grammar.

Table 8-1. Syntactic categories

Symbol

Meaning

Example

S

sentence

the man walked

NP

noun phrase

a dog

VP

verb phrase

saw a park

PP

prepositional phrase

with a telescope

Det

determiner

the

N

noun

dog

V

verb

walked

P

preposition

in

A production like VP -> V NP | V NP PP has a disjunction ...

Get Natural Language Processing with Python now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.