Chapter 7 Advanced Web Scraping and Data Gathering

Learning Objectives

By the end of this chapter, you will be able to:

Make use of requests and BeautifulSoup to read various web pages and gather data from them
Perform read operations on XML files and the web using an Application Program Interface (API)
Make use of regex techniques to scrape useful information from a large and messy text corpus

In this chapter, you will learn how to gather data from web pages, XML files, and APIs.

Introduction

The previous chapter covered how to create a successful data wrangling pipeline. In this chapter, we will build a real-life web scraper using all of the techniques that we have learned so far. This chapter builds on the foundation of BeautifulSoup ...

Get Data Wrangling with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Data Wrangling with Python by Dr. Tirthajyoti Sarkar, Shubhadeep Roychowdhury

Chapter 7

Advanced Web Scraping and Data Gathering

Learning Objectives

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly