Credit: Will Ware
Given a list of cities, Example 11-1 fetches their latitudes and longitudes from one web site (http://www.astro.ch, a database used for astrology, of all things) and uses them to dynamically build a URL for another web site (http://pubweb.parc.xerox.com), which, in turn, creates a map highlighting the cities against the outlines of continents. Maybe someday a program will be clever enough to load the latitudes and longitudes as waypoints into your GPS receiver.
The code can be vastly improved in several ways. The main fragility
of the recipe comes from relying on the exact format of the HTML page
returned by the http://www.astro.com
site, particularly in the rather clumsy for x in inf.readlines( )
loop in the findcity
function. If this format ever changes, the recipe will break. You
could change the recipe to use htmllib.HTMLParser
instead, and be a tad more immune to modest format changes. This
helps only a little, however. After all, HTML is meant for human
viewers, not for automated parsing and extraction of information. A
better approach would be to find a site serving similar information
in XML (including, quite possibly, XHTML, the XML/HTML hybrid that
combines the strengths of both of its parents) and parse the
information with Python’s powerful XML tools
(covered in Chapter 12).
However, despite this defect, this recipe still stands as an example of the kind of opportunity already afforded today by existing services on the Web, without having to wait for the emergence of commercialized web services.
Example 11-1. Fetching latitude/longitude data from the Web
import string, urllib, re, os, exceptions, webbrowser JUST_THE_US = 0 class CityNotFound(exceptions.Exception): pass def xerox_parc_url(marklist): """ Prepare a URL for the xerox.com map-drawing service, with marks at the latitudes and longitudes listed in list-of-pairs marklist. """ avg_lat, avg_lon = max_lat, max_lon = marklist[0] marks = ["%f,%f" % marklist[0]] for lat, lon in marklist[1:]: marks.append(";%f,%f" % (lat, lon)) avg_lat = avg_lat + lat avg_lon = avg_lon + lon if lat > max_lat: max_lat = lat if lon > max_lon: max_lon = lon avg_lat = avg_lat / len(marklist) avg_lon = avg_lon / len(marklist) if len(marklist) == 1: max_lat, max_lon = avg_lat + 1, avg_lon + 1 diff = max(max_lat - avg_lat, max_lon - avg_lon) D = {'height': 4 * diff, 'width': 4 * diff, 'lat': avg_lat, 'lon': avg_lon, 'marks': ''.join(marks)} if JUST_THE_US: url = ("http://pubweb.parc.xerox.com/map/db=usa/ht=%(height)f" + "/wd=%(width)f/color=1/mark=%(marks)s/lat=%(lat)f/" + "lon=%(lon)f/") % D else: url = ("http://pubweb.parc.xerox.com/map/color=1/ht=%(height)f" + "/wd=%(width)f/color=1/mark=%(marks)s/lat=%(lat)f/" + "lon=%(lon)f/") % D return url def findcity(city, state): Please_click = re.compile("Please click") city_re = re.compile(city) state_re = re.compile(state) url = ("""http://www.astro.ch/cgi-bin/atlw3/aq.cgi?expr=%s&lang=e""" % (string.replace(city, " ", "+") + "%2C+" + state)) lst = [ ] found_please_click = 0 inf = urllib.FancyURLopener( ).open(url) for x in inf.readlines( ): x = x[:-1] if Please_click.search(x) != None: # Here is one assumption about unchanging structure found_please_click = 1 if (city_re.search(x) != None and state_re.search(x) != None and found_please_click): # Pick apart the HTML pieces L = [ ] for y in string.split(x, '<'): L = L + string.split(y, '>') # Discard any pieces of zero length lst.append(filter(None, L)) inf.close( ) try: # Here's a few more assumptions x = lst[0] lat, lon = x[6], x[10] except IndexError: raise CityNotFound("not found: %s, %s"%(city, state)) def getdegrees(x, dividers): if string.count(x, dividers[0]): x = map(int, string.split(x, dividers[0])) return x[0] + (x[1] / 60.) elif string.count(x, dividers[1]): x = map(int, string.split(x, dividers[1])) return -(x[0] + (x[1] / 60.)) else: raise CityNotFound("Bogus result (%s)" % x) return getdegrees(lat, "ns"), getdegrees(lon, "ew") def showcities(citylist): marklist = [ ] for city, state in citylist: try: lat, lon = findcity(city, state) print ("%s, %s:" % (city, state)), lat, lon marklist.append((lat, lon)) except CityNotFound, message: print "%s, %s: not in database? (%s)" % (city, state, message) url = xerox_parc_url(marklist) # Print URL # os.system('netscape "%s"' % url) webbrowser.open(url) # Export a few lists for test purposes citylist = (("Natick", "MA"), ("Rhinebeck", "NY"), ("New Haven", "CT"), ("King of Prussia", "PA")) citylist1 = (("Mexico City", "Mexico"), ("Acapulco", "Mexico"), ("Abilene", "Texas"), ("Tulum", "Mexico")) citylist2 = (("Munich", "Germany"), ("London", "England"), ("Madrid", "Spain"), ("Paris", "France")) if _ _name_ _=='_ _main_ _': showcities(citylist1)
Documentation for the standard library module
htmlllib
in the Library Reference; information about the Xerox PARC map viewer is
at http://www.parc.xerox.com/istl/projects/mapdocs/;
AstroDienst hosts a worldwide server of
latitude/longitude data (http://www.astro.com/cgi-bin/atlw3/aq.cgi).
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.