Mapping Words to Properties Using Python Dictionaries
As we have seen, a tagged word of the form (word, tag) is an association between a word and a part-of-speech
tag. Once we start doing part-of-speech tagging, we will be creating
programs that assign a tag to a word, the tag which is most likely in a
given context. We can think of this process as mapping from words to tags. The most natural
way to store mappings in Python uses the so-called dictionary data type (also known as an
associative array or hash array in other programming languages). In
this section, we look at dictionaries and see how they can represent a
variety of language information, including parts-of-speech.
Indexing Lists Versus Dictionaries
A text, as we have seen, is treated in Python as a list of
words. An important property of lists is that we can “look up” a
particular item by giving its index, e.g., text1[100]. Notice how we specify a number
and get back a word. We can think of a list as a simple kind of table,
as shown in Figure 5-2.

Figure 5-2. List lookup: We access the contents of a Python list with the help of an integer index.
Contrast this situation with frequency distributions (Computing with Language: Simple Statistics), where we
specify a word and get back a number, e.g., fdist['monstrous'], which tells us the number of times a given word has occurred in a text. Lookup using words is familiar to anyone ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access