O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Training a POS tagger

We will now look at training our own POS tagger, using NLTK's tagged set corpora and the sklearn random forest machine learning (ML) model. The complete Jupyter Notebook for this section is available at Chapter02/02_example.ipynb, in the book's code repository. This will be a classification task, as we need to predict the POS tag for a given word in a sentence. We will utilize the NLTK treebank dataset, with POS tags, as the training or labeled data. We will extract the word prefixes and suffixes, and previous and neighboring words in the text, as features for the training. These features are good indicators for categorizing words to different parts of speech. The code that follows shows how we can extract these features: ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required