NLTK uses a pre-trained machine learning model (averaged perceptron) for POS tagging. The task is especially hard for English because, unlike many other languages, the same word can play the role of different parts of speech depending on the context:
In [16]: from nltk import download In [17]: download('averaged_perceptron_tagger') [nltk_data] Downloading package averaged_perceptron_tagger to [nltk_data] /Users/Oleksandr/nltk_data... [nltk_data] Package averaged_perceptron_tagger is already up-to- [nltk_data] date! Out[17]: True In [18]: from nltk import pos_tag, pos_tag_sents In [19]: pos_tag(word_tokenize('Cats, cat, Cat, and "The Cats"')) Out[19]: [('Cats', 'NNS'), (',', ','), ('cat', 'NN'), (',', ','), ('Cat', ...