August 2014
Beginner to intermediate
304 pages
7h 10m
English
A simple way to do named entity extraction is to chunk all proper nouns (tagged with NNP). We can tag these chunks as NAME, since the definition of a proper noun is the name of a person, place, or thing.
Using the RegexpParser class, we can create a very simple grammar that combines all proper nouns into a NAME chunk. Then, we can test this on the first tagged sentence of treebank_chunk to compare the results with the previous recipe:
>>> chunker = RegexpParser(r'''
... NAME:
... {<NNP>+}
... ''')
>>> sub_leaves(chunker.parse(treebank_chunk.tagged_sents()[0]), 'NAME')
[[('Pierre', 'NNP'), ('Vinken', 'NNP')], [('Nov.', 'NNP')]]Although we get Nov. as a NAME chunk, this isn't a wrong result, as Nov. is ...
Read now
Unlock full access