Part-of-speech (POS) tagging

NLTK uses a pre-trained machine learning model (averaged perceptron) for POS tagging. The task is especially hard for English because, unlike many other languages, the same word can play the role of different parts of speech depending on the context:

In [16]: from nltk import download In [17]: download('averaged_perceptron_tagger') [nltk_data] Downloading package averaged_perceptron_tagger to [nltk_data] /Users/Oleksandr/nltk_data... [nltk_data] Package averaged_perceptron_tagger is already up-to- [nltk_data] date! Out[17]: True In [18]: from nltk import pos_tag, pos_tag_sents In [19]: pos_tag(word_tokenize('Cats, cat, Cat, and "The Cats"')) Out[19]: [('Cats', 'NNS'), (',', ','), ('cat', 'NN'), (',', ','), ('Cat', ...

Get Machine Learning with Swift now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.