July 2023
Beginner
288 pages
8h 11m
English
You need data to do data science, and when you don’t have a dataset on hand, you can try web scraping, a set of techniques for reading information directly from public websites and converting it to usable datasets. In this chapter, we’ll cover some common web-scraping techniques.
We’ll start with the simplest possible kind of scraping: downloading a web page’s code and looking for relevant text. We’ll then discuss regular expressions, a set of methods for searching logically through text, and Beautiful Soup, a free Python library that can help you parse websites more easily by directly accessing HyperText Markup Language (HTML) ...