8.12. Extracting the Path from a URL
Problem
You want to extract the path from a string that holds a
URL. For example, you want to
extract /index.html
from http://www.regexcookbook.com/index.html
or from /index.html#fragment
.
Solution
Extract the path from a string known to hold a valid URL. The following finds a match for all URLs, even for URLs that have no path:
\A # Skip over scheme and authority, if any ([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)? # Path ([a-z0-9\-._~%!$&'()*+,;=:@/]*)
Regex options: Free-spacing, case insensitive |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?([a-z0-9\-._~%!$&'()*+,;=:@/]*)
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Extract the path from a string known to hold a valid URL. Only match URLs that actually have a path:
\A # Skip over scheme and authority, if any ([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)? # Path (/?[a-z0-9\-._~%!$&'()*+,;=@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?|/) # Query, fragment, or end of URL ([#?]|\Z)
Regex options: Free-spacing, case insensitive |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?(/?[a-z0-9\-._~%!$&'()*+,;=@]+↵ (/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?|/)([#?]|$)
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Extract the path from a string known to hold a valid URL. Use atomic grouping to match only those URLs that actually have a path:
\A # Skip over scheme and authority, if ...
Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.