August 2012
Intermediate to advanced
609 pages
19h 16m
English
You want to extract the path from a string that holds a
URL. For example, you want to
extract /index.html from http://www.regexcookbook.com/index.html
or from /index.html#fragment.
Extract the path from a string known to hold a valid URL. The following finds a match for all URLs, even for URLs that have no path:
\A # Skip over scheme and authority, if any ([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)? # Path ([a-z0-9\-._~%!$&'()*+,;=:@/]*)
| Regex options: Free-spacing, case insensitive |
| Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?([a-z0-9\-._~%!$&'()*+,;=:@/]*)
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Extract the path from a string known to hold a valid URL. Only match URLs that actually have a path:
\A # Skip over scheme and authority, if any ([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)? # Path (/?[a-z0-9\-._~%!$&'()*+,;=@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?|/) # Query, fragment, or end of URL ([#?]|\Z)
| Regex options: Free-spacing, case insensitive |
| Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?(/?[a-z0-9\-._~%!$&'()*+,;=@]+↵ (/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?|/)([#?]|$)
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Extract the path from a string known to hold a valid URL. Use atomic grouping to match only those URLs that actually have a path:
\A # Skip over scheme and authority, if ...