8.1. Validating URLs

Problem

You want to check whether a given piece of text is a URL that is valid for your purposes.

Solution

Allow almost any URL:

^(https?|ftp|file)://.+$
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python
\A(https?|ftp|file)://.+\Z
Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Require a domain name, and don’t allow a username or password:

\A                         # Anchor
(https?|ftp)://            # Scheme
[a-z0-9-]+(\.[a-z0-9-]+)+  # Domain
([/?].*)?                  # Path and/or parameters
\Z                         # Anchor
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
^(https?|ftp)://[a-z0-9-]+(\.[a-z0-9-]+)+↵
([/?].+)?$
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Require a domain name, and don’t allow a username or password. Allow the scheme (http or ftp) to be omitted if it can be inferred from the subdomain (www or ftp):

\A                             # Anchor
((https?|ftp)://|(www|ftp)\.)  # Scheme or subdomain
[a-z0-9-]+(\.[a-z0-9-]+)+      # Domain
([/?].*)?                      # Path and/or parameters
\Z                             # Anchor
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
^((https?|ftp)://|(www|ftp)\.)[a-z0-9-]+(\.[a-z0-9-]+)+([/?].*)?$
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

Require a domain name and a path that points to an image file. Don’t allow a username, password, or parameters:

\A # Anchor (https?|ftp):// # Scheme [a-z0-9-]+(\.[a-z0-9-]+)+ ...

Get Regular Expressions Cookbook, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.