9.10. Find Words Within XML-Style Comments
Problem
You want to find all occurrences of the word TODO
within (X)HTML or
XML comments. For example, you want to match only the underlined text
within the following string:
This "TODO" is not within a comment, but the next one is. <!--
TODO
: ↵ Come up with a cooler comment for this example. -->
Solution
There are at least two approaches to this problem, and both have their advantages. The first tactic, which we’ll call the “two-step approach,” is to find comments with an outer regex, and then search within each match using a separate regex or even a plain text search. That works best if you’re writing code to do the job, since separating the task into two steps keeps things simple and fast. However, if you’re searching through files using a text editor or grep tool, splitting the task in two won’t work unless your tool of choice offers a special option to search within matches found by another regex.[23]
When you need to find words within comments using a single regex, you can accomplish this with the help of lookaround. This second method is shown in the upcoming section .
Two-step approach
When it’s a workable option, the better solution is to split the
task in two: search for comments, and then search within those
comments for TODO
.
Here’s how you can find comments:
<!--.*?-->
Regex options: Dot matches line breaks |
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby |
Standard JavaScript doesn’t have a “dot matches line breaks” option, ...
Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.