7. Advanced Web Scraping and Data Gathering

Overview

This chapter will introduce you to the concepts of advanced web scraping and data gathering. It will enable you to use requests and BeautifulSoup to read various web pages and gather data from them. You can perform read operations on XML files and the web using an Application Program Interface (API). You can use regex techniques to scrape useful information from a large and messy text corpus. By the end of this chapter, you will have learned how to gather data from web pages, XML files, and APIs.

Introduction

The previous chapter covered how to create a successful data wrangling pipeline. In this chapter, we will build a web scraper that can be used by a data wrangling professional in ...

Get The Data Wrangling Workshop - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.