3.18. Replace All Matches Between the Matches of Another Regex
Problem
You want to replace all the matches of a particular regular expression, but only within certain sections of the subject string. Another regular expression matches the text between the sections. In other words, you want to search and replace through all parts of the subject string not matched by the other regular expression.
Say you have an HTML file in which you want to replace straight
double quotes with smart (curly) double quotes, but you only want to
replace the quotes outside of HTML tags. Quotes within HTML tags must
remain plain ASCII straight quotes, or your web browser won’t be able to parse the
HTML anymore. For example, you want to
turn "text"
<span
class="middle">"text"</span> "text"
into
“text”
<span class="middle">“text”</span>
“text”
.
Solution
C#
string resultString = null; Regex outerRegex = new Regex("<[^<>]*>"); Regex innerRegex = new Regex("\"([^\"]*)\""); // Find the first section int lastIndex = 0; Match outerMatch = outerRegex.Match(subjectString); while (outerMatch.Success) { // Search-and-replace through the text between this match, // and the previous one string textBetween = subjectString.Substring(lastIndex, outerMatch.Index - lastIndex); resultString = resultString + innerRegex.Replace(textBetween, "\u201C$1\u201D"); lastIndex = outerMatch.Index + outerMatch.Length; // Copy the text in the section unchanged resultString = resultString + outerMatch.Value; // Find the next section outerMatch ...
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.