Context-Free Grammar

A Simple Grammar

Let’s start off by looking at a simple context-free grammar (CFG). By convention, the lefthand side of the first production is the start-symbol of the grammar, typically S, and all well-formed trees must have this symbol as their root label. In NLTK, context-free grammars are defined in the nltk.grammar module. In Example 8-9 we define a grammar and show how to parse a simple sentence admitted by the grammar.

Example 8-9. A simple context-free grammar.

grammar1 = nltk.parse_cfg("""
  S -> NP VP
  VP -> V NP | V NP PP
  PP -> P NP
  V -> "saw" | "ate" | "walked"
  NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "man" | "dog" | "cat" | "telescope" | "park"
  P -> "in" | "on" | "by" | "with"
  """)
>>> sent = "Mary saw Bob".split()
>>> rd_parser = nltk.RecursiveDescentParser(grammar1)
>>> for tree in rd_parser.nbest_parse(sent):
...      print tree
(S (NP Mary) (VP (V saw) (NP Bob)))

The grammar in Example 8-9 contains productions involving various syntactic categories, as laid out in Table 8-1. The recursive descent parser used here can also be inspected via a graphical interface, as illustrated in Figure 8-3; we discuss this parser in more detail in Parsing with Context-Free Grammar.

Table 8-1. Syntactic categories

Symbol

Meaning

Example

S

sentence

the man walked

NP

noun phrase

a dog

VP

verb phrase

saw a park

PP

prepositional phrase

with a telescope

Det

determiner

the

N

noun

dog

V

verb

walked

P

preposition

in

A production like VP -> V NP | V NP PP has a disjunction ...

Get Natural Language Processing with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.