Chapter 7
Advanced Web Scraping and Data Gathering
Learning Objectives
By the end of this chapter, you will be able to:
- Make use of requests and BeautifulSoup to read various web pages and gather data from them
- Perform read operations on XML files and the web using an Application Program Interface (API)
- Make use of regex techniques to scrape useful information from a large and messy text corpus
In this chapter, you will learn how to gather data from web pages, XML files, and APIs.
Introduction
The previous chapter covered how to create a successful data wrangling pipeline. In this chapter, we will build a real-life web scraper using all of the techniques that we have learned so far. This chapter builds on the foundation of BeautifulSoup ...
Get Data Wrangling with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.