9.9. Remove XML-Style Comments
You want to remove comments from an (X)HTML or XML document. For example, you want to remove development comments from a web page before it is served to web browsers, or you want to perform subsequent searches without finding any matches within comments.
Finding comments is not a difficult task, thanks to the availability of lazy quantifiers. Here is the regular expression for the job:
|Regex options: Dot matches line breaks|
|Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby|
|Regex options: None|
To remove the comments, replace all matches with the empty string (i.e., nothing). Recipe 3.14 lists code to replace all matches of a regex.
How it works
At the beginning and end of this regular expression are the
literal character sequences ‹
<!--› and ‹
-->›. Since none of those characters are
special in regex syntax (except within character classes, where
hyphens create ranges), they don’t need to be escaped. That just
leaves the ‹
[\s\S]*?› in the middle of the regex to examine ...