Training a unigram part-of-speech tagger
A unigram generally refers to a single token. Therefore, a unigram tagger only uses a single word as its context for determining the part-of-speech tag.
UnigramTagger
inherits from NgramTagger
, which is a subclass of ContextTagger
, which inherits from SequentialBackoffTagger
. In other words, UnigramTagger
is a context-based tagger whose context is a single word, or unigram.
How to do it...
UnigramTagger
can be trained by giving it a list of tagged sentences at initialization.
>>> from nltk.tag import UnigramTagger >>> from nltk.corpus import treebank >>> train_sents = treebank.tagged_sents()[:3000] >>> tagger = UnigramTagger(train_sents) >>> treebank.sents()[0] ['Pierre', 'Vinken', ',', '61', 'years', 'old', ...
Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.