Skip to Main Content
Python 3 Text Processing with NLTK 3 Cookbook - Second Edition
book

Python 3 Text Processing with NLTK 3 Cookbook - Second Edition

by Jacob Perkins
August 2014
Beginner to intermediate content levelBeginner to intermediate
304 pages
7h 10m
English
Packt Publishing
Content preview from Python 3 Text Processing with NLTK 3 Cookbook - Second Edition

Creating a part-of-speech tagged word corpus

Part-of-speech tagging is the process of identifying the part-of-speech tag for a word. Most of the time, a tagger must first be trained on a training corpus. How to train and use a tagger is covered in detail in Chapter 4, Part-of-speech Tagging, but first we must know how to create and use a training corpus of part-of-speech tagged words.

Getting ready

The simplest format for a tagged corpus is of the form word/tag. The following is an excerpt from the brown corpus:

The/at-tl expense/nn and/cc time/nn involved/vbn are/ber astronomical/jj ./.

Each word has a tag denoting its part-of-speech. For example, nn refers to a noun, while a tag that starts with vb is a verb.

Note

Different corpora can use different ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Machine Learning - Third Edition

Python Machine Learning - Third Edition

Sebastian Raschka, Vahid Mirjalili
Python Cookbook, 3rd Edition

Python Cookbook, 3rd Edition

David Beazley, Brian K. Jones

Publisher Resources

ISBN: 9781782167853Supplemental Content