URL Access

A URL identifies a resource on the Internet. A URL is a string composed of several optional parts, called components, known as scheme, location, path, query, and fragment. A URL with all its parts looks something like:

scheme://lo.ca.ti.on/pa/th?query#fragment

For example, in http://www.python.org:80/faq.cgi?src=fie, the scheme is http, the location is www.python.org:80, the path is /faq.cgi, the query is src=fie, and there is no fragment. Some of the punctuation characters form a part of one of the components they separate, while others are just separators and are part of no component. Omitting punctuation implies missing components. For example, in , the scheme is mailto, the path is , and there is no location, query, or fragment. The missing // means the URL has no location part, the missing ? means it has no query part, and the missing # means it has no fragment part.

The urlparse Module

The urlparse module supplies functions to analyze and synthesize URL strings. The most frequently used functions of module urlparse are urljoin, urlsplit, and urlunsplit.

urljoin

urljoin(base_url_string,relative_url_string)

Returns a URL string u, obtained by joining relative_url_string, which may be relative, with base_url_string. The joining procedure that urljoin performs to obtain its result u may be summarized as follows:

  • When either of the argument strings is empty, u is the other argument.

  • When relative_url_string explicitly specifies a scheme that is different ...

Get Python in a Nutshell, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.