Training a chunker with NLTK-Trainer
At the end of the previous chapter, Chapter 4, Part-of-speech Tagging, we introduced NLTK-Trainer and the train_tagger.py
script. In this recipe, we will cover the script for training chunkers: train_chunker.py
.
Note
You can find NLTK-Trainer at https://github.com/japerk/nltk-trainer and the online documentation at http://nltk-trainer.readthedocs.org/.
How to do it...
As with train_tagger.py
, the only required argument to train_chunker.py
is the name of a corpus. In this case, we need a corpus that provides a chunked_sents()
method, such as treebank_chunk
. Here's an example of running train_chunker.py
on treebank_chunk
:
$ python train_chunker.py treebank_chunk loading treebank_chunk 4009 chunks, training on 4009 ...
Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.