Strings with Escapes
Problem
You need a regex that matches a string, which is a sequence of zero or more characters enclosed by double quotes. A string with nothing between the quotes is an empty string. A double quote can be included in the string by escaping it with a backslash, and backslashes can also be used to escape other characters in the string. Strings cannot include line breaks, and line breaks cannot be escaped with backslashes.
Solution
"[^"\\\r\n]*(?:\\.[^"\\\r\n]*)"
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
This regular expression has the same structure as the one in the
preceding recipe. The difference is that we now have two characters with
a special meaning: the double quote and the backslash. We exclude both
from the characters matched by the two negated character classes. We use
‹\\.› to separately match
any escaped character. ‹\\› matches a single backslash, and ‹.› matches
any character that is not a line break. Make sure the option “dot
matches line breaks” is turned off.
Variations
Strings delimited with single quotes can be matched just as easily:
'[^'\\\r\n]*(?:\\.[^'\\\r\n]*)'
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
If your language supports both single-quoted and double-quoted strings, you’ll need to handle those as separate alternatives:
"[^"\\\r\n]*(?:\\.[^"\\\r\n]*)"|'[^'\\\r\n]*(?:\\.[^'\\\r\n]*)'
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, ... |
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access