8.6. Find a Specific Attribute in XML-Style Tags
Problem
You want to find tags within an (X)HTML or XML file that contain
a specific attribute, such as id
.
This recipe covers several variations on the same problem. Suppose that you want to match each of the following types of strings using separate regular expressions:
Tags that contain an
id
attribute.<div>
tags that contain anid
attribute.Tags that contain an
id
attribute with the valuemy-id
.Tags that contain
my-class
within theirclass
attribute value (classes are separated by whitespace).
Solution
Tags that contain an id attribute (quick and dirty)
If you want to do a quick search in a text editor that lets you preview your results, the following (overly simplistic) regex might do the trick:
<[^>]+\sid\b[^>]*>
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Here’s a breakdown of the regex in free-spacing mode:
< # Start of the tag [^>]+ # Tag name, attributes, etc. \s id \b # The target attribute name, as a whole word [^>]* # The remainder of the tag, including the id attribute's value > # End of the tag
Regex options: Case insensitive, free-spacing |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
Tags that contain an id attribute (more reliable)
Unlike the regex just shown, this next take on the same
problem supports quoted attribute values that contain literal
>
characters, and it doesn’t match tags that merely contain the word
id
within
one of their attributes’ values:
<(?:[^>"']|"[^"]*"|'[^']*')+?\sid\s*=\s*("[^"]*"|'[^']*')↵ ...
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.