N-Gram Tagging
Unigram Tagging
Unigram taggers are based on a simple statistical algorithm: for each
token, assign the tag that is most likely for that particular token.
For example, it will assign the tag JJ
to any occurrence of the word
frequent, since frequent is
used as an adjective (e.g., a frequent word) more
often than it is used as a verb (e.g., I frequent this
cafe). A unigram tagger behaves just like a lookup tagger
(Automatic Tagging), except there is a more
convenient technique for setting it up, called training. In the following code sample, we
train a unigram tagger, use it to tag a sentence, and then
evaluate:
>>> from nltk.corpus import brown >>> brown_tagged_sents = brown.tagged_sents(categories='news') >>> brown_sents = brown.sents(categories='news') >>> unigram_tagger = nltk.UnigramTagger(brown_tagged_sents) >>> unigram_tagger.tag(brown_sents[2007]) [('Various', 'JJ'), ('of', 'IN'), ('the', 'AT'), ('apartments', 'NNS'), ('are', 'BER'), ('of', 'IN'), ('the', 'AT'), ('terrace', 'NN'), ('type', 'NN'), (',', ','), ('being', 'BEG'), ('on', 'IN'), ('the', 'AT'), ('ground', 'NN'), ('floor', 'NN'), ('so', 'QL'), ('that', 'CS'), ('entrance', 'NN'), ('is', 'BEZ'), ('direct', 'JJ'), ('.', '.')] >>> unigram_tagger.evaluate(brown_tagged_sents) 0.9349006503968017
We train a UnigramTagger
by specifying tagged sentence data as a parameter when we initialize the tagger. The training process involves inspecting the tag of each word and storing the most likely tag for any word in a ...
Get Natural Language Processing with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.