August 2014
Beginner to intermediate
304 pages
7h 10m
English
In this recipe, we'll cover the train_classifier.py script from NLTK-Trainer, which lets you train NLTK classifiers from the command line. NLTK-Trainer was previously introduced at the end of Chapter 4, Part-of-speech Tagging, and again at the end of Chapter 5, Extracting Chunks.
You can find NLTK-Trainer at https://github.com/japerk/nltk-trainer and the online documentation at http://nltk-trainer.readthedocs.org/.
Like train_tagger.py and train_chunker.py, the only required argument for train_classifier.py is the name of a corpus. The corpus must have a categories() method, because text classification is all about learning to classify categories. Here's an example of running train_classifier.py ...