Extracting Context Free Grammar (CFG) rules from Treebank

CFG was defined for natural languages in 1957 by Noam Chomsky. A CFG consists of the following components:

  • A set of non terminal nodes (N)
  • A set of terminal nodes (T)
  • Start symbol (S)
  • A set of production rules (P) of the form:

    A→a

CFG rules are of two types—Phrase structure rules and Sentence structure rules.

A Phrase Structure Rule can be defined as follows—A→a, where A Î N and a consists of Terminals and Non terminals.

In Sentence level Construction of CFG, there are four structures:

  • Declarative structure: Deals with declarative sentences (the subject is followed by a predicate).
  • Imperative structure: Deals with imperative sentences, commands, or suggestions (sentences begin with a verb phrase ...

Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.