8 Web Scraping

You need data to do data science, and when you don’t have a dataset on hand, you can try web scraping, a set of techniques for reading information directly from public websites and converting it to usable datasets. In this chapter, we’ll cover some common web-scraping techniques.

We’ll start with the simplest possible kind of scraping: downloading a web page’s code and looking for relevant text. We’ll then discuss regular expressions, a set of methods for searching logically through text, and Beautiful Soup, a free Python library that can help you parse websites more easily by directly accessing HyperText Markup Language (HTML) ...

Get Dive Into Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.