Extending a Feature-Based Grammar

In this section, we return to feature-based grammar and explore a variety of linguistic issues, and demonstrate the benefits of incorporating features into the grammar.

Subcategorization

In Chapter 8, we augmented our category labels to represent different kinds of verbs, and used the labels IV and TV for intransitive and transitive verbs respectively. This allowed us to write productions like the following:

Example 9-31. 

VP -> IV
VP -> TV NP

Although we know that IV and TV are two kinds of V, they are just atomic non-terminal symbols in a CFG and are as distinct from each other as any other pair of symbols. This notation doesn’t let us say anything about verbs in general; e.g., we cannot say “All lexical items of category V can be marked for tense,” since walk, say, is an item of category IV, not V. So, can we replace category labels such as TV and IV by V along with a feature that tells us whether the verb combines with a following NP object or whether it can occur without any complement?

A simple approach, originally developed for a grammar framework called Generalized Phrase Structure Grammar (GPSG), tries to solve this problem by allowing lexical categories to bear a SUBCAT feature, which tells us what subcategorization class the item belongs to. In contrast to the integer values for SUBCAT used by GPSG, the example here adopts more mnemonic values, namely intrans, trans, and clause:

Example 9-32. 

VP[TENSE=?t, NUM=?n] -> V[SUBCAT=intrans, TENSE=?t, ...

Get Natural Language Processing with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.