7.24. Extract the File Extension from a Windows Path
Problem
You have a string that holds a (syntactically) valid path to a
file or folder on a Windows PC or network, and you want to extract the
file extension, if any, from the path. For example, you want to extract .ext from c:\folder\file.ext.
Solution
\.[^.\\/:*?"<>|\r\n]+$
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
We can use the same technique for extracting the file extension as we used for extracting the whole filename in Recipe 7.23.
The only difference is in how we handle dots. The regex in Recipe 7.23 does not include any dots. The negated character class in that regex will simply match any dots that happen to be in the filename.
A file extension must begin with a dot. Thus, we add ‹\.› to match a literal dot at
the start of the regex.
Filenames such as Version 2.0.txt may contain multiple
dots. The last dot is the one that delimits the extension from the
filename. The extension itself should not contain any dots. We specify
this in the regex by putting a dot inside the character class. The dot
is simply a literal character inside character classes, so we don’t
need to escape it. The ‹$› anchor at the end of the regex makes sure we
match .txt
instead of .0.
If the string ends with a backslash, or with a filename that doesn’t include any dots, the regex won’t match at all. When it does match, it will match the extension, including the dot that delimits the extension and ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access