9.7. Find a Specific Attribute in XML-Style Tags
Problem
Within an (X)HTML or XML file, you want to find tags that contain
a specific attribute, such as id.
This recipe covers several variations on the same problem. Suppose that you want to match each of the following types of strings using separate regular expressions:
Tags that contain an
idattribute.<div>tags that contain anidattribute.Tags that contain an
idattribute with the valuemy-id.Tags that contain
my-classwithin theirclassattribute value (even if there are multiple classes separated by whitespace).
Solution
Tags that contain an id attribute (quick and dirty)
If you want to do a quick search in a text editor that lets you preview your results, the following (overly simplistic) regex might do the trick:
<[^>]+\sid\b[^>]*>
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Here’s a breakdown of the regex in free-spacing mode:
< # Start of the tag [^>]+ # Tag name, attributes, etc. \s id \b # The target attribute name, as a whole word [^>]* # The remainder of the tag, including the id attribute's value > # End of the tag
| Regex options: Case insensitive, free-spacing |
| Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby |
Tags that contain an id attribute (more reliable)
Unlike the regex just shown, this next take on the same problem
supports quoted attribute values that contain literal > characters, and it
doesn’t match tags that merely contain the word id within one of their attributes’ ...