Strings
Problem
You need a regex that matches a string, which is a sequence of zero or more characters enclosed by double quotes. A string with nothing between the quotes is an empty string. Two sequential double quotes in a character string denote a single character, a double quote. Strings cannot include line breaks. Backslashes or other characters have no special meaning in strings.
Your regular expression should match any string, including empty
strings, and it should return a single match for strings that contain
double quotes. For example, it should return "before quote""after quote"
as a single
match, rather than matching "before quote"
and "after quote"
separately.
Solution
"[^"\r\n]*(?:""[^"\r\n]*)*"
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
Matching a string that cannot contain quotes or line breaks would
be easy with ‹"[^\r\n"]*"
›. Double quotes are literal characters
in regular expressions, and we can easily match a sequence of characters
that are not quotes or line breaks with a negated character
class.
But our strings can contain quotes if they are specified as two
consecutive quotes. Matching these is not much more difficult if we
handle the quotes separately. After the opening quote, we use ‹[^\r\n"]*
› to match anything but
quotes and line breaks. This may be followed by zero or more pairs of
double quotes. We could match those with ‹(?:"")*
›, but after each pair of double quotes, the string can have more characters that ...
Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.