9.10. Find Words Within XML-Style Comments


You want to find all occurrences of the word TODO within (X)HTML or XML comments. For example, you want to match only the underlined text within the following string:

        This "TODO" is not within a comment, but the next one is. <!-- 
        : ↵
Come up with a cooler comment for this example. -->


There are at least two approaches to this problem, and both have their advantages. The first tactic, which we’ll call the “two-step approach,” is to find comments with an outer regex, and then search within each match using a separate regex or even a plain text search. That works best if you’re writing code to do the job, since separating the task into two steps keeps things simple and fast. However, if you’re searching through files using a text editor or grep tool, splitting the task in two won’t work unless your tool of choice offers a special option to search within matches found by another regex.[23]

When you need to find words within comments using a single regex, you can accomplish this with the help of lookaround. This second method is shown in the upcoming section .

Two-step approach

When it’s a workable option, the better solution is to split the task in two: search for comments, and then search within those comments for TODO.

Here’s how you can find comments:

Regex options: Dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Standard JavaScript doesn’t have a “dot matches line breaks” option, ...

Get Regular Expressions Cookbook, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.