8.1. Validating URLs

Problem

You want to check whether a given piece of text is a URL that is valid for your purposes.

Solution

Allow almost any URL:

^(https?|ftp|file)://.+$
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python
\A(https?|ftp|file)://.+\Z
Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Require a domain name, and don’t allow a username or password:

\A                         # Anchor
(https?|ftp)://            # Scheme
[a-z0-9-]+(\.[a-z0-9-]+)+  # Domain
([/?].*)?                  # Path and/or parameters
\Z                         # Anchor
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
^(https?|ftp)://[a-z0-9-]+(\.[a-z0-9-]+)+↵
([/?].+)?$
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Require a domain name, and don’t allow a username or password. Allow the scheme (http or ftp) to be omitted if it can be inferred from the subdomain (www or ftp):

\A                             # Anchor
((https?|ftp)://|(www|ftp)\.)  # Scheme or subdomain
[a-z0-9-]+(\.[a-z0-9-]+)+      # Domain
([/?].*)?                      # Path and/or parameters
\Z                             # Anchor
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
^((https?|ftp)://|(www|ftp)\.)[a-z0-9-]+(\.[a-z0-9-]+)+([/?].*)?$
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

Require a domain name and a path that points to an image file. Don’t allow a username, password, or parameters:

\A # Anchor (https?|ftp):// # Scheme [a-z0-9-]+(\.[a-z0-9-]+)+ ...

Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.