
264 R Programming for Bioinformatics
can then be processed by grep and sub to find those that refer to Bioconductor
packages. A second approach is to return to the HTML source and there we
notice that the elements we are interested in are always subelements of b
elements. In the code below we refine our XPath query to select only those a
elements that are direct descendants of b elements.
> f2 = getNodeSet(s1, "//b/a[@href]")
> p2 = sapply(f2, xmlValue)
> length(p2)
[1] 261
> p2[1:10]
[1] "lamb1" "wilson2" "wellington" "liverpool"
[5] "lemming" "pitt" "A" "ABarray"
[9] "aCGH" "ACME"
We can compare our results to the web page and see that this procedure
has