3.13. Find a Match Within Another Match

Problem

You want to find all the matches of a particular regular expression, but only within certain sections of the subject string. Another regular expression matches each of the sections in the string.

Suppose you have an HTML file in which various passages are marked as bold with <b> tags. You want to find all numbers marked as bold. If some bold text contains multiple numbers, you want to match all of them separately. For example, when processing the string 1 <b>2</b> 3 4 <b>5 6 7</b>, you want to find four matches: 2, 5, 6, and 7.

Solution

C#

StringCollection resultList = new StringCollection();
Regex outerRegex = new Regex("<b>(.*?)</b>", RegexOptions.Singleline);
Regex innerRegex = new Regex(@"\d+");
// Find the first section
Match outerMatch = outerRegex.Match(subjectString);
while (outerMatch.Success) {
    // Get the matches within the section
	Match innerMatch = innerRegex.Match(outerMatch.Groups[1].Value);
	while (innerMatch.Success) {
		resultList.Add(innerMatch.Value);
		innerMatch = innerMatch.NextMatch();
	}
	// Find the next section
    outerMatch = outerMatch.NextMatch();
}

VB.NET

Dim ResultList = New StringCollection Dim OuterRegex As New Regex("<b>(.*?)</b>", RegexOptions.Singleline) Dim InnerRegex As New Regex("\d+") 'Find the first section Dim OuterMatch = OuterRegex.Match(SubjectString) While OuterMatch.Success 'Get the matches within the section Dim InnerMatch = InnerRegex.Match(OuterMatch.Groups(1).Value) While InnerMatch.Success ResultList.Add(InnerMatch.Value) ...

Get Regular Expressions Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.