Working with URLs

The URL contains a great deal of Internet information in a single string. It tells you the name of the server, the name of the file on the server, any data that you are supplying to generate a dynamic response, and even the protocol to use to retrieve the information. In basic form, URLs look like this:

http://www.oreilly.com/oreilly/about.html

This URL has three elements. The first section tells you (or your software) the protocol in use for this resource. In this case, it is HTTP, shown by http:. The next section indicates the server name and its corresponding domain. In this case the server is named www, and the domain is oreilly.com, coming together as //www.oreilly.com. What follow are a pathname (/oreilly/) and a filename (about.html). Your browser uses this information as it comes to the brilliant conclusion to use HTTP in connecting with www in oreilly.com, and retrieves the /oreilly/about.html file.

Of course, URLs can become more complicated. If you type “Python” into a search box and click Submit, your browser may go after a URL similar to the following:

http://search.oreilly.com/cgi-bin/search?term=Python&category=All&pref=all

Now there are several more items to examine. First, the server has changed from www to search. Second, the path has changed from /oreilly/ to /cgi-bin/. The filename about.html has been replaced with a target named search. But most interesting is the question mark and the data that follows:

?term=Python&category=All&pref=all

This portion ...

Get Python & XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.