5

Scraping the Web with Scrapy and Beautiful Soup

In previous chapters, we learned about web scraping-related technologies, data-finding techniques, and using various Python libraries to scrape data from the web.

In this chapter, we will explore and learn practically about two popular Python libraries, Scrapy and Beautiful Soup. Scrapy is a web crawling framework for Python and provides a project-oriented scope for web scraping. Beautiful Soup, on the other hand, deals with document or content parsing. Parsing a document is normally done to effectively traverse and extract content. Apart from this, both libraries are heavily loaded with DOM-related features.

In particular, we will learn about the following topics in this chapter:

  • Web parsing ...

Get Hands-On Web Scraping with Python - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.