Chapter 5. Extracting Chunks

In this chapter, we will cover the following recipes:

  • Chunking and chinking with regular expressions
  • Merging and splitting chunks with regular expressions
  • Expanding and removing chunks with regular expressions
  • Partial parsing with regular expressions
  • Training a tagger-based chunker
  • Classification-based chunking
  • Extracting named entities
  • Extracting proper noun chunks
  • Extracting location chunks
  • Training a named entity chunker
  • Training a chunker with NLTK-Trainer


Chunk extraction, or partial parsing, is the process of extracting short phrases from a part-of-speech tagged sentence. This is different from full parsing in that we're interested in standalone chunks, or phrases, instead of full parse trees (for more on parse ...

Get Python 3 Text Processing with NLTK 3 Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.