August 2014
Beginner to intermediate
304 pages
7h 10m
English
At the end of the previous chapter, Chapter 4, Part-of-speech Tagging, we introduced NLTK-Trainer and the train_tagger.py script. In this recipe, we will cover the script for training chunkers: train_chunker.py.
You can find NLTK-Trainer at https://github.com/japerk/nltk-trainer and the online documentation at http://nltk-trainer.readthedocs.org/.
As with train_tagger.py, the only required argument to train_chunker.py is the name of a corpus. In this case, we need a corpus that provides a chunked_sents() method, such as treebank_chunk. Here's an example of running train_chunker.py on treebank_chunk:
$ python train_chunker.py treebank_chunk loading treebank_chunk 4009 chunks, training on 4009 ...