August 2018
Intermediate to advanced
366 pages
10h 14m
English
The urllib.parse module has multiple tools to parse URLs. The most commonly used solution is to rely on urllib.parse.urlparse, which can handle the most widespread kinds of URLs:
import urllib.parse
def parse_url(url):
"""Parses an URL of the most widespread format.
This takes for granted there is a single set of parameters
for the whole path.
"""
parts = urllib.parse.urlparse(url)
parsed = vars(parts)
parsed['query'] = urllib.parse.parse_qs(parts.query)
return parsed
The preceding code snippet can be called on the command line, as follows:
>>> url = 'http://user:pwd@host.com:80/path/subpath?arg1=val1&arg2=val2#fragment' >>> result = parse_url(url) >>> print(result) OrderedDict([('scheme', 'http'), ('netloc', 'user:pwd@host.com:80'), ...