Chapter 7

Advanced Web Scraping and Data Gathering

Learning Objectives

By the end of this chapter, you will be able to:

  • Make use of requests and BeautifulSoup to read various web pages and gather data from them
  • Perform read operations on XML files and the web using an Application Program Interface (API)
  • Make use of regex techniques to scrape useful information from a large and messy text corpus

In this chapter, you will learn how to gather data from web pages, XML files, and APIs.


The previous chapter covered how to create a successful data wrangling pipeline. In this chapter, we will build a real-life web scraper using all of the techniques that we have learned so far. This chapter builds on the foundation of BeautifulSoup ...

