The urllib Module
The urlib module provides a unified client interface for HTTP, FTP, and
gopher. It automatically picks the right protocol handler based on
the uniform resource locator (URL) passed to the library.
Fetching data from a URL is extremely easy. Just call the
urlopen method, and read from the returned stream
object, as shown in Example 7-14.
Example 7-14. Using the urllib Module to Fetch a Remote Resource
File: urllib-example-1.py
import urllib
fp = urllib.urlopen("http://www.python.org")
op = open("out.html", "wb")
n = 0
while 1:
s = fp.read(8192)
if not s:
break
op.write(s)
n = n + len(s)
fp.close()
op.close()
for k, v in fp.headers.items():
print k, "=", v
print "copied", n, "bytes from", fp.url
server = Apache/1.3.6 (Unix)
content-type = text/html
accept-ranges = bytes
date = Mon, 11 Oct 1999 20:11:40 GMT
connection = close
etag = "741e9-7870-37f356bf"
content-length = 30832
last-modified = Thu, 30 Sep 1999 12:25:35 GMT
copied 30832 bytes from http://www.python.orgNote that stream object provides some non-standard attributes.
headers is a Message object
(as defined by the mimetools module), and
url contains the actual URL. The latter is updated
if the server redirects the client to a new URL.
The urlopen function is actually a helper
function, which creates an instance of the
FancyURLopener class and calls its
open method. To get special behavior, you can
subclass that class. For instance, the class in Example 7-15 automatically logs in to the server when ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access