Analysing HTML code and extracting data

In the previous sections, we learned the basics of HTML, CSS, and XPath. To scrape real-world web pages, the problem now becomesa question of writing the proper CSS or XPath selectors. In this section, we introduce some simple ways to figure out working selectors.

Suppose we want to scrape all available R packages at https://cran.rstudio.com/web/packages/available_packages_by_name.html. The web page looks simple. To figure out the selector expression, right-click on the table and select Inspect Element in the context menu, which should be available in most modern web browsers:

Analysing HTML code and extracting data

Then the inspector panel shows ...

Get Learning R Programming now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.