Tap into a powerful new way to find exactly what you're looking for on a page.
Firefox contains a little-known but powerful feature called XPath. XPath is a query language for searching the Document Object Model (DOM) that Firefox constructs from the source of a web page.
As mentioned in "Add or Remove Content on a Page"
[Hack #6]
, virtually every hack in this book revolves around the DOM. Many hacks work on a collection of elements. Without XPath, you would need to get a list of elements (for example, with document.getElementsByTagName
) and then test each one to see if it's something of interest. With XPath expressions, you can find exactly the elements you want, all in one shot, and then immediately start working with them.
Tip
A good beginners' tutorial on XPath is available at http://www.zvon.org/xxl/XPathTutorial/General/examples.html.
To execute an XPath query, use the
document.evaluate
function. Here's the basic syntax:
var snapshotResults = document.evaluate('XPath expression', document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
The function takes five parameters:
- The XPath expression itself
More on this in a minute.
- The root node on which to evaluate the expression
If you want to search the entire web page, pass in
document
. But you can also search just a part of the page. For example, to search within a<div id="foo">
, passdocument.getElementById("foo")
as the second parameter.- A namespace resolver function
You can use this to create XPath queries that work on XHTML pages. See "Select Multiple Checkboxes" [Hack #36] for an example.
- The type of result to return
If you want a collection of elements, use
XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE
. If you want to find a single element, useXPathResult.FIRST_ORDERED_NODE_TYPE
. More on this in a minute, too.- A previous XPath result to append to this result
I rarely use this, but it can be useful if you want to conditionally concatenate the results of multiple XPath queries.
The document.evaluate
function returns a
snapshot, which is a static array of DOM nodes. You can iterate through the snapshot or access its items in any order. The snapshot is static, which means it will never change, no matter what you do to the page. You can even delete DOM nodes as you move through the snapshot.
A snapshot is not an array, and it doesn't support the standard array properties or accessors. To get the number of items in the snapshot, use snapResults.snapshotLength
. To access a particular item, you need to call snapshotResults.snapshotItem(index)
. Here is the skeleton of a script that executes an XPath query and loops through the results:
var snapResults = document.evaluate("XPath expression", document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null); for (var i = snapResults.snapshotLength - 1; i >= 0; i--) { var elm = snapResults.snapshotItem(i); // do stuff with elm }
The following XPath query finds all the elements on a page with class="foo"
:
var snapFoo = document.evaluate(//*[@class='foo']",
document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
The // means "search for things anywhere below the root node, including nested elements." The * matches any element, and [@class='foo']
restricts the search to elements with a class of foo
.
You can use XPath to search for specific elements. The following query finds all <input type="hidden">
elements. (This example is taken from "Show Hidden Form Fields"
[Hack #30]
.)
var snapHiddenFields = document.evaluate("//input[@type='hidden']",
document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
You can also test for the presence of an attribute, regardless of its value. The following query finds all elements with an accesskey
attribute. (This example is taken from "Add an Access Bar with Keyboard Shortcuts"
[Hack #68]
.)
var snapAccesskeys = document.evaluate("//*[@accesskey]",
document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
Not impressed yet? Here's a query that finds images whose URL contains the string "MZZZZZZZ
". (This example is taken from "Make Amazon Product Images Larger"
[Hack #25]
.)
var snapProductImages = document.evaluate("//img[contains(@src,
'MZZZZZZZ')",
document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
You can also do combinations of attributes. This query finds all images with a width of 36 and a height of 14. (This query is taken from "Zap Ugly XML Buttons" [Hack #86] .)
var snapXMLImages = document.evaluate("//img[@width='36'][@height='14']",
document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
But wait, there's more! By using more advanced XPath syntax, you can actually find elements that are contained within other elements. This code finds all the links that are contained in a paragraph whose class is g
. (This example is taken from "Refine Your Google Search"
[Hack #96]
.)
var snapResults = document.evaluate("//p[@class='g']//a",
document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
Finally, you can find a specific element by passing XPathResult.FIRST_ORDERED_NODE_TYPE
in the third parameter. This line of code finds the first link whose class is "yschttl
". (This example is taken from "Prefetch Yahoo! Search Results"
[Hack #52]
.)
var elmFirstResult = document.evaluate("//a[@class='yschttl']",
document, null, <b>XPathResult.FIRST_ORDERED_NODE_TYPE</b>, null).singleNodeValue;
If you weren't brain-fried by now, I'd be very surprised. XPath is, quite literally, a language all its own. Like regular expressions, XPath can make your life easier, or it can make your life a living hell. Remember, you can always get what you need (eventually) with standard DOM functions such as document.getElementById
or document.getElementsByTagName
. XPath's a good tool to have in your tool chest, but it's not always the right tool for the job.
Get Greasemonkey Hacks now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.