O'Reilly logo

R Data Analysis Cookbook - Second Edition by Kuntal Ganguly

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Extracting HTML table data from a web page

Though it is possible to treat HTML data as a specialized form of XML, R provides specific functions to extract data from HTML tables, as follows:

> url <- "WorldPopulation-wiki.htm" > tables <- readHTMLTable(url) > world.pop <- tables[[6]] 

The readHTMLTable() function parses the web page and returns a list of all the tables that are found on the page. For tables that have an id attribute, the function uses the id attribute as the name of that list element.

We are interested in extracting the "10 most populous countries", which is the fifth table, so we use tables[[6]].

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required