Chapter 7

Advanced Web Scraping and Data Gathering

Learning Objectives

By the end of this chapter, you will be able to:

  • Make use of requests and BeautifulSoup to read various web pages and gather data from them
  • Perform read operations on XML files and the web using an Application Program Interface (API)
  • Make use of regex techniques to scrape useful information from a large and messy text corpus

In this chapter, you will learn how to gather data from web pages, XML files, and APIs.

Introduction

The previous chapter covered how to create a successful data wrangling pipeline. In this chapter, we will build a real-life web scraper using all of the techniques that we have learned so far. This chapter builds on the foundation of BeautifulSoup ...

Get Data Wrangling with Python now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.