3.18. Replace All Matches Between the Matches of Another Regex

Problem

You want to replace all the matches of a particular regular expression, but only within certain sections of the subject string. Another regular expression matches the text between the sections. In other words, you want to search and replace through all parts of the subject string not matched by the other regular expression.

Say you have an HTML file in which you want to replace straight double quotes with smart (curly) double quotes, but you only want to replace the quotes outside of HTML tags. Quotes within HTML tags must remain plain ASCII straight quotes, or your web browser won’t be able to parse the HTML anymore. For example, you want to turn "text" <span class="middle">"text"</span> "text" into “text” <span class="middle">“text”</span> “text”.

Solution

C#

string resultString = null; Regex outerRegex = new Regex("<[^<>]*>"); Regex innerRegex = new Regex("\"([^\"]*)\""); // Find the first section int lastIndex = 0; Match outerMatch = outerRegex.Match(subjectString); while (outerMatch.Success) { // Search-and-replace through the text between this match, // and the previous one string textBetween = subjectString.Substring(lastIndex, outerMatch.Index - lastIndex); resultString = resultString + innerRegex.Replace(textBetween, "\u201C$1\u201D"); lastIndex = outerMatch.Index + outerMatch.Length; // Copy the text in the section unchanged resultString = resultString + outerMatch.Value; // Find the next section outerMatch ...

Get Regular Expressions Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.