Training a tagger-based chunker
Training a chunker can be a great alternative to manually specifying regular expression chunk patterns. Instead of a pain-staking process of trial and error to get the exact right patterns, we can use existing corpus data to train chunkers much like we did for part-of-speech tagging in the previous chapter.
How to do it...
As with the part-of-speech tagging, we'll use the treebank
corpus data for training. But this time, we'll use the treebank_chunk
corpus, which is specifically formatted to produce chunked sentences in the form of trees. These chunked_sents()
methods will be used by a TagChunker
class to train a tagger-based chunker. The TagChunker
class uses a helper function, conll_tag_chunks()
, to extract a list ...
Get Python 3 Text Processing with NLTK 3 Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.