Scraping web data

In most cases, the majority of data will not exist in your database, but will instead be published in different forms on the Internet. To dig up more valuable information from these data sources, we need to know how to access and scrape data from the Web. Here, we will illustrate how to use the rvest package to harvest finance data from http://www.bloomberg.com/.

Getting ready

In this recipe, you need to prepare your environment with R installed and a computer that can access the Internet.

How to do it…

Perform the following steps to scrape data from http://www.bloomberg.com/:

  1. First, access the following link to browse the S&P 500 index on the Bloomberg Business websitehttp://www.bloomberg.com/quote/SPX:IND:

    Figure 9: S&P 500 index ...

Get R for Data Science Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.