9.7. Find a Specific Attribute in XML-Style Tags
Problem
Within an (X)HTML or XML file, you want to find tags that contain
a specific attribute, such as id
.
This recipe covers several variations on the same problem. Suppose that you want to match each of the following types of strings using separate regular expressions:
Tags that contain an
id
attribute.<div>
tags that contain anid
attribute.Tags that contain an
id
attribute with the valuemy-id
.Tags that contain
my-class
within theirclass
attribute value (even if there are multiple classes separated by whitespace).
Solution
Tags that contain an id attribute (quick and dirty)
If you want to do a quick search in a text editor that lets you preview your results, the following (overly simplistic) regex might do the trick:
<[^>]+\sid\b[^>]*>
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Here’s a breakdown of the regex in free-spacing mode:
< # Start of the tag [^>]+ # Tag name, attributes, etc. \s id \b # The target attribute name, as a whole word [^>]* # The remainder of the tag, including the id attribute's value > # End of the tag
Regex options: Case insensitive, free-spacing |
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby |
Tags that contain an id attribute (more reliable)
Unlike the regex just shown, this next take on the same problem
supports quoted attribute values that contain literal >
characters, and it
doesn’t match tags that merely contain the word id
within one of their attributes’ ...
Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.