Learn web scraping and crawling techniques to access unlimited data from any web source in any format. With this practical guide, you’ll learn how to use Python scripts and web APIs to gather and process data from thousands—or even millions—of web pages at once.
Ideal for programmers, security professionals, and web administrators familiar with Python, this book not only teaches basic web scraping mechanics, but also delves into more advanced topics, such as analyzing raw data or using scrapers for frontend website testing. Code samples are available to help you understand the concepts in practice.
Table of contents
- I. Building Scrapers
- 1. Your First Web Scraper
- 2. Advanced HTML Parsing
- 3. Starting to Crawl
- 4. Using APIs
- 5. Storing Data
- 6. Reading Documents
- II. Advanced Scraping
- 7. Cleaning Your Dirty Data
- 8. Reading and Writing Natural Languages
- 9. Crawling Through Forms and Logins
- 11. Image Processing and Text Recognition
- 12. Avoiding Scraping Traps
- 13. Testing Your Website with Scrapers
- 14. Scraping Remotely
- A. Python at a Glance
- B. The Internet at a Glance
C. The Legalities and Ethics of Web Scraping
- Trademarks, Copyrights, Patents, Oh My!
- Trespass to Chattels
- The Computer Fraud and Abuse Act
- robots.txt and Terms of Service
- Three Web Scrapers
- Title: Web Scraping with Python
- Release date: July 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491910290
You might also like
Learning Python, 5th Edition
Get a comprehensive, in-depth introduction to the core Python language with this hands-on book. Based on …
Data Science from Scratch, 2nd Edition
To really learn data science, you should not only master the tools—data science libraries, frameworks, modules, …
Automate the Boring Stuff with Python
Automate the Boring Stuff with Python teaches simple programming skills to automate everyday computer tasks.
Python for Programmers, First Edition
The professional programmer's Deitel® guide to Python® with introductory artificial intelligence case studies Written for programmers …