Skip to Content
Natural Language Processing with Python
book

Natural Language Processing with Python

by Steven Bird, Ewan Klein, Edward Loper
June 2009
Beginner to intermediate
504 pages
16h 27m
English
O'Reilly Media, Inc.
Content preview from Natural Language Processing with Python

Working with Toolbox Data

Given the popularity of Toolbox among linguists, we will discuss some further methods for working with Toolbox data. Many of the methods discussed in previous chapters, such as counting, building frequency distributions, and tabulating co-occurrences, can be applied to the content of Toolbox entries. For example, we can trivially compute the average number of fields for each entry:

>>> from nltk.corpus import toolbox
>>> lexicon = toolbox.xml('rotokas.dic')
>>> sum(len(entry) for entry in lexicon) / len(lexicon)
13.635955056179775

In this section, we will discuss two tasks that arise in the context of documentary linguistics, neither of which is supported by the Toolbox software.

Adding a Field to Each Entry

It is often convenient to add new fields that are derived automatically from existing ones. Such fields often facilitate search and analysis. For instance, in Example 11-7 we define a function cv(), which maps a string of consonants and vowels to the corresponding CV sequence, e.g., kakapua would map to CVCVCVV. This mapping has four steps. First, the string is converted to lowercase, then we replace any non-alphabetic characters [^a-z] with an underscore. Next, we replace all vowels with V. Finally, anything that is not a V or an underscore must be a consonant, so we replace it with a C. Now, we can scan the lexicon and add a new cv field after every lx field. Example 11-7 shows what this does to a particular entry; note the last line of output, which ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Natural Language Processing with Python and spaCy

Natural Language Processing with Python and spaCy

Yuli Vasiliev
Hands-On Natural Language Processing with Python

Hands-On Natural Language Processing with Python

Rajesh Arumugam, Rajalingappaa Shanmugamani

Publisher Resources

ISBN: 9780596803346Errata Page