Relation Extraction
Once named entities have been identified in a text, we then want
to extract the relations that exist between them. As indicated earlier,
we will typically be looking for relations between specified types of
named entity. One way of approaching this task is to initially look for
all triples of the form (X, α,
Y), where X and
Y are named entities of the required types, and α
is the string of words that intervenes between X
and Y. We can then use regular expressions to pull
out just those instances of α that express the relation that we are
looking for. The following example searches for strings that contain the
word in. The special regular expression (?!\b.+ing\b) is a negative lookahead
assertion that allows us to disregard strings such as success
in supervising the transition of, where
in is followed by a gerund.
>>> IN = re.compile(r'.*\bin\b(?!\b.+ing)')
>>> for doc in nltk.corpus.ieer.parsed_docs('NYT_19980315'):
... for rel in nltk.sem.extract_rels('ORG', 'LOC', doc,
... corpus='ieer', pattern = IN):
... print nltk.sem.show_raw_rtuple(rel)
[ORG: 'WHYY'] 'in' [LOC: 'Philadelphia']
[ORG: 'McGlashan & Sarrail'] 'firm in' [LOC: 'San Mateo']
[ORG: 'Freedom Forum'] 'in' [LOC: 'Arlington']
[ORG: 'Brookings Institution'] ', the research group in' [LOC: 'Washington']
[ORG: 'Idealab'] ', a self-described business incubator based in' [LOC: 'Los Angeles']
[ORG: 'Open Text'] ', based in' [LOC: 'Waterloo'] [ORG: 'WGBH'] 'in' [LOC: 'Boston'] [ORG: 'Bastille Opera'] 'in' ...Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access