7.7. Validating Generic URLs
Problem
You want to check whether a given piece of text is a valid URL according to RFC 3986.
Solution
\A (# Scheme [a-z][a-z0-9+\-.]*: (# Authority & path // ([a-z0-9\-._~%!$&'()*+,;=]+@)? # User ([a-z0-9\-._~%]+ # Named host |\[[a-f0-9:.]+\] # IPv6 host |\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\]) # IPvFuture host (:[0-9]+)? # Port (/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/? # Path |# Path without authority (/?[a-z0-9\-._~%!$&'()*+,;=:@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?)? ) |# Relative URL (no scheme or authority) (# Relative path [a-z0-9\-._~%!$&'()*+,;=@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/? |# Absolute path (/[a-z0-9\-._~%!$&'()*+,;=:@]+)+/? ) ) # Query (\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)? # Fragment (\#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)? \Z
Regex options: Free-spacing, case insensitive |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
\A (# Scheme (?<scheme>[a-z][a-z0-9+\-.]*): (# Authority & path // (?<user>[a-z0-9\-._~%!$&'()*+,;=]+@)? # User (?<host>[a-z0-9\-._~%]+ # Named host | \[[a-f0-9:.]+\] # IPv6 host | \[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\]) # IPvFuture host (?<port>:[0-9]+)? # Port (?<path>(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?) # Path |# Path without authority (?<path>/?[a-z0-9\-._~%!$&'()*+,;=:@]+ (/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?)? ) |# Relative URL (no scheme or authority) (?<path> # Relative path [a-z0-9\-._~%!$&'()*+,;=@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/? |# Absolute path (/[a-z0-9\-._~%!$&'()*+,;=:@]+)+/? ) ) # Query (?<query>\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)? ...
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.