Learn web scraping and crawling techniques to access unlimited data from any web source in any format. With this practical guide, you’ll learn how to use Python scripts and web APIs to gather and process data from thousands—or even millions—of web pages at once.
Ideal for programmers, security professionals, and web administrators familiar with Python, this book not only teaches basic web scraping mechanics, but also delves into more advanced topics, such as analyzing raw data or using scrapers for frontend website testing. Code samples are available to help you understand the concepts in practice.
Table of Contents
- I. Building Scrapers
- 1. Your First Web Scraper
- 2. Advanced HTML Parsing
- 3. Starting to Crawl
- 4. Using APIs
- 5. Storing Data
- 6. Reading Documents
- II. Advanced Scraping
- 7. Cleaning Your Dirty Data
- 8. Reading and Writing Natural Languages
- 9. Crawling Through Forms and Logins
- 11. Image Processing and Text Recognition
- 12. Avoiding Scraping Traps
- 13. Testing Your Website with Scrapers
- 14. Scraping Remotely
- A. Python at a Glance
- B. The Internet at a Glance
C. The Legalities and Ethics of Web Scraping
- Trademarks, Copyrights, Patents, Oh My!
- Trespass to Chattels
- The Computer Fraud and Abuse Act
- robots.txt and Terms of Service
- Three Web Scrapers
- Title: Web Scraping with Python
- Release date: July 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491910290