
202
|
Chapter 4, Mapping (on) the Web
#45 Extract a Spatial Model from Wikipedia
HACK
my ($link,$name) = $doc =~ /\<li\>\<a href\=\"\/([^"])\"\>
([\w| ])\<\/a/;
if ($link and $name) {
$link = $wiki_base.$link;
my ($object) = Class::RDF->search(wm->wiki_page =>
$link);
$object = Class::RDF::Object->create(wm->wiki_page
=> $link, wm->name => $name) if not $object;
$object->wm::connects($country);
}
}
}
Watching this script run is like watching a potted conceptual history of the
world. Now, we’ve built a graph of all the countries in the world, according to
Wikipedia, with links to famous people, places, and events. But we don’t know
which are which, nor can we distinguish casual from important mentions.
Cities and other spatial things. We can deepen this spatial index by using a gaz-
etteer service. “Build a Free World Gazetteer”
[Hack #84] was written with this
purpose in mind. We go through the list of each country’s backlinks and, for
page names that look likely to be places, try to find them in the gazetteer.
We use a simple set of rules of thumb, partially borrowed from Maciej
Ceglowski (http://www.idlewords.com), to identify things worth trying to
geocode:
• Things beginning with numbers are not cities.
• If the name is three or more words long, it’s probably not a city name.
• If this is a city name, all the words will be capitalized.
To request the information about the city from the gazetteer, ...